Background & Summary

HHEX demonstrates evolutionary conservation in directing endodermal specification, with versatile regulatory functions observed across species1,2. Vertebrate studies reveal its essential role in both anterior-posterior axis establishment and lineage commitment of hepatic and pancreatic lineages through spatiotemporal modulation of endodermal progenitor niches3,4,5,6,7. HHEX plays a pivotal role in modulating epithelial biology, influencing hepatic ductal morphogenesis, biliary epithelial functional development, and endothelial cell regulation8,9,10. For example, loss of HHEX expression induces embryonic lethality at embryonic day 10.5 (E10.5) in mice and manifests lethality phenotypes at day 18 in porcine embryos, confirming its indispensable role in endodermal organogenesis11.

The intestine, a quintessential endodermal organ, cooperates with the liver and pancreas to regulate systemic metabolism12,13. During vertebrate embryogenesis, pancreatic progenitors originate from intestinal endodermal epithelia, subsequently forming dorsal and ventral pancreatic buds14,15. Notably, HHEX is indispensable for ventral bud specification—HHEX-knockout mice exhibit complete ventral pancreas agenesis16,17. Moreover, recent studies have identified that CK2-mediated interactions between HHEX and the YAP-TEAD complex promote colorectal carcinogenesis18. These findings underscore HHEX’s roles in regulating intestinal endodermal development. Nevertheless, the precise cellular and molecular mechanisms by which HHEX modulates gut epithelial differentiation remains poorly characterized. Accounting for approximately 60% of midgut cells, ECs represent the predominant epithelial population, thus establishing HHEX-mediated regulation of EC development as an essential determinant of intestinal morphogenesis and homeostasis19,20,21.

The Drosophila midgut shares structural and functional homology with mammalian intestines and exhibits a conserved cellular composition, including intestinal stem cells (ISCs), ECs, and EEs. Furthermore, the Drosophila GAL4-UAS system offers unparalleled precision for EC-specific HHEX perturbation, enabling direct investigation of its cell-autonomous effects on ECs differentiation22,23,24. These attributes collectively establish the Drosophila midgut as an ideal model for dissecting HHEX’ s role in gut biology25,26,27.

We generated an NP1-Gal4 > UAS-HHEX RNAi strain to achieve EC-specific knockdown of HHEX (HHEX-KD). Then we performed single-cell RNA sequencing (scRNA-seq) to investigate the functional impact of HHEX on the Drosophila midgut. First, HHEX-KD larvae exhibited shortened midguts without significant changes in body weight (Fig. 1c–e). Subsequent scRNA-seq analysis revealed that HHEX-KD disrupted EC differentiation, as evidenced by reduced mature EC populations and impaired differentiation of midgut primordium (MP) into ECs, with HHEX-knockdown-induced cells (HICs) emerged.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Experimental Design and Drosophila Stock Construction. (a) Schematic workflow of the study. Ctrl group: NP1-Gal4 > UAS-mCherry RNAi; KD group: NP1-Gal4 > UAS-HHEX RNAi KD. (b) RT-qPCR validation of HHEX knockdown efficiency in dissected midguts. HHEX mRNA levels were significantly reduced in KD compared to Ctrl (unpaired two-tailed t-test, with P value < 0.05 considered statistically significant (*)). (c,d) Midgut length comparison between KD and Ctrl groups. KD midguts (n = 30) were shorter than Ctrl (n = 30). Scale bars: 1 mm. (e) No significant difference in larval body weight between groups (t-test, P value > 0.05). (f) UMAP visualization of HHEX expression from scRNA-seq data. Enterocytes (ECs) in Ctrl showed higher HHEX expression levels compared to KD. Each point represents one cell; cells colored white (low) to blue (high HHEX); annotated ECs were demarcated with solid blue outlines.

Our study establishes the first single-cell transcriptomic atlas of HHEX perturbation in Drosophila L3 midgut epithelium. The data suggest that midgut homeostasis during this developmental stage may involve adaptive mechanisms that preserve digestive capacity under HHEX-associated differentiation constraints. These findings suggest that HHEX may play a potential role in coordinating midgut epithelial differentiation dynamics, while possibly participating in establishing endodermal organ differentiation paradigms. This dataset serves as a foundational resource for exploring conserved regulatory principles of intestinal homeostasis across species.

Methods

The overall research pipeline of this study is illustrated in Fig. 1a. We constructed a Drosophila EC-specific HHEX KD strainand performed scRNA-seq, followed by analysis using a well-established analytical workflow for single-cell transcriptomics. This approach enabled comprehensive annotation of major cell types in the third-instar larval midgut of Drosophila. We systematically compared differences in subtype proportions, population sizes, and gene expression profiles between the control group and the HHEX-KD group.

EC-specific HHEX knockdown straingeneration and third-instar larval cultivation

The following strains were used: (1) White Dahomey (wDah); NP1-Gal4/CyO (a gift from Dr. Yuxuan Lyu, Southern University of Science and Technology); (2) UAS-HHEX-shRNA-attP40/CyO (TH04118.N; Tsinghua Fly Center); (3) yv; UAS-mCherry-shRNA-attP2 (BDSC #35785; Bloomington Drosophila Stock Center). To investigate the role of HHEX in regulating larval midgut EC development, we knocked down HHEX using a gut EC-specific Gal4 driver, NP1-Gal4 (also known as Myo31DF-Gal4)22,25,28. Flies carrying NP1-Gal4 were crossed with either UAS-HHEX-shRNA-attP40/CyO or yv; UAS-mCherry-shRNA-attP2 lines to generate experimental groups:

KD group: wDah; NP1-Gal4/CyO (male) × UAS-HHEX-shRNA-attP40/CyO (female).

Ctrl group: wDah; NP1-Gal4/CyO (male) × yv; UAS-mCherry-shRNA-attP2 (female).

Flies were maintained at 25 °C on standard cornmeal-agar medium under 12 h light/dark cycles. Synchronized wandering third-instar larvae were collected based on their characteristic wandering behavior (migration from food substrate to vial walls for pupariation) at ~120 hours post-egg laying.

RNA Isolation and RT-qPCR analysis

Midguts from Ctrl group and KD group were microdissected for RNA extraction using RNA Isolator (Vazyme R401). Reverse transcription of 1 μg total RNA per sample was performed using HiScript II Q Select RT Supermix (Vazyme R233). cDNA templates were diluted 10 times prior to quantitative PCR analysis with ChamQ SYBR Mastermix (Vazyme Q311). Primer pairs (designed using FlyPrimerBank29) were validated through standard curve generation with serial cDNA dilutions. Gene expression was quantified based on cycle threshold (Ct) values normalized to αTub84B reference gene. Primer sequences are as follows:

HHEX-F: 5′-GTTCAGCCAACAGCCTATTGT-3′

HHEX-R: 5′-GGAGGCAGGATTGGGGAAT-3′

Tissue dissociation and single-cell suspension preparation

Midgut dissociation was performed following established protocols26. Third-instar larvae of KD group and Ctrl group were dissected in ice-cold PBS containing 1% BSA (w/v). For each experimental group, >30 midguts were collected after removal of the crop, midgut-hindgut junction (with associated Malpighian tubules), and residual peritrophic matrix.

Midguts were minced into fragments and digested in 400 μL elastase solution (1 mg/mL PBS) in 1.5 mL Eppendorf tubes with orbital shaking at 27 °C (300 rpm, 30 min). Reactions were stopped by 1% BSA. Cell suspensions were sequentially filtered through 100 μm and 40 μm nylon meshes, then subjected to density gradient centrifugation using OptiPrep (Axis-Shield) at 1.12 g/mL. After centrifugation (800 × g, 20 min, 4 °C), viable cells were harvested from the upper interface layer. Cell viability (>90%) was confirmed by trypan blue exclusion.

scRNA-seq library preparation, sequencing, and genomic alignment

scRNA-seq libraries were prepared using the DNBelab C Series Single-Cell Library Prep Kit (MGI, 940-001924-00), with each library capturing approximately 20,000 cells. To ensure experimental reproducibility, two biological replicate libraries were generated for each experimental group, resulting in a total of four libraries (two each for KD group and Ctrl group).

The experimental workflow consisted of sequential phases: microfluidic droplet generation, reverse transcription within emulsion droplets, emulsion destabilization for product recovery, magnetic bead-based cDNA purification, and PCR amplification of cDNA libraries. Following library preparation, DNA quantification was conducted using the Qubit ssDNA Assay Kit (Thermo Fisher Scientific, Q10212). Libraries subsequently underwent paired-end sequencing on the MGI DNBSEQ-T1 platform at the China National GeneBank30.

Raw sequencing data were aligned to the Drosophila melanogaster reference genome (BDGP6.28 assembly, GenBank accession: GCA_000001215.4) through the standardized DNBelab C Series analysis pipeline (https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_HT_scRNA-analysis-software)31,32,33. Read mapping parameters included specification of an expected cell count (expect cells = 20000) corresponding to library preparation parameters. Detailed alignment statistics and quality control metrics are summarized in Table 1.

Table 1 Quality control metrics for single-cell RNA sequencing alignment.

scRNA-seq data processing

Raw expression matrices were processed into AnnData objects (anndata v0.9.2) using Python v3.9, followed by rigorous quality control: cells with <200 UMIs or <100 detected genes were filtered (Scanpy.pp.filter_cells), genes expressed in <3 cells or with >20,000 counts were excluded (Scanpy.pp.filter_genes), and mitochondrial gene contamination (genes with mt: prefix) was restricted to <8% per cell34,35. Doublets were computationally removed using OmicVerse.pp.qc with an automated threshold score of 0.5336. Batch-corrected latent embeddings were generated via scVI (n_latent = 15), trained for 200 epochs on GPU-accelerated hardware using default neural network architecture (2 hidden layers, 128 nodes each), enabling downstream uniform manifold approximation (Scanpy.tl.umap) and Leiden clustering (resolution = 1.6)37.

Clustering and marker genes calculation

Single-cell transcriptomic analysis was performed using Omicverse v1.5.0 and Scanpy v1.9.335,36. Raw count matrices were preprocessed through Pearson residual-based normalization and variance stabilization via Omicverse.pp.preprocess, retaining 3,000 highly variable genes (HVGs) followed by zero-centered scaling using Omicverse.pp.scale. Dimensionality reduction leveraged scVI-derived principal components (n_pcs = 15) to construct a neighborhood graph (Scanpy.pp.neighbors, n_neighbors = 15), with subsequent UMAP embedding (Scanpy.tl.umap)37. Clustering was performed via Leiden algorithm at resolution 1.6 (Scanpy.tl.leiden), identifying 32 transcriptionally distinct clusters. Marker genes were computationally prioritized using Wilcoxon rank-sum tests (Scanpy.tl.rank_genes_groups) and refined through COSG optimization (Omicverse.single.cosg, top 30 genes per cluster). Final annotations integrated orthology mapping from FlyBase (https://flybase.org/) and conserved expression patterns from published Drosophila intestinal cell atlases26,38.

Statistical comparison of cell population abundance

Cell population changes between KD and Ctrl groups were examined using the milopy v 0.1.139. Analysis started with scVI-processed latent representations. A k-nearest neighbor graph (k = 100, 15 PCs) was built. Sample-level cell counts came from metadata identifiers in the dataset. A generalized linear model tested KD and Ctrl conditions for abundance changes (p_value < 0.05). Log-fold change (log2FC) values represent directional changes in the KD group: positive log2FC values show KD group increases, negative values show decreases. Results appear in violin plots colored by cell types. This method quantifies population dynamics while reducing technical noise.

Data Records

All raw sequencing data (paired-end reads in FASTQ format) generated in this study have been deposited under restricted access in the China National GeneBank Nucleotide Sequence Archive (CNSA; Project accession: CNP0007162; https://doi.org/10.26036/CNP0007162) at https://db.cngb.org/data_resources/project/CNP0007162/40,41,42. This controlled access facilitates data sharing according to established protocols. The dataset comprises four paired libraries, grouped as follows: Ctrl Group (Ctrl-1, Ctrl-2) and KD Group (KD-1, KD-2). Detailed library metadata, including unique library identifiers, experimental group assignments, associated raw data file names (X_1.fastq.gz, X_2.fastq.gz), are comprehensively listed in Table 2, and are also accessible via the corresponding sample records on the CNSA project page40. Processed single-cell gene expression data, including cell-type annotations, are publicly available via Figshare (https://doi.org/10.6084/m9.figshare.28927709) at https://figshare.com/articles/dataset/10_6084_m9_figshare_2892770943. This repository contains:(1) Per-sample raw count matrices, packaged in ZIP archives named after each library (e.g., Ctrl-1.zip). Each archive contains the standard compatible files: barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz.(2) The integrated processed dataset, stored as an AnnData object (file: NP1_HHEX_anno.h5ad). This file contains the expression matrix, computed dimensionality reduction coordinates, cluster assignments, and curated cell-type annotations.

Table 2 Single-cell RNA sequencing sample metadata.

Technical Validation

Functional validation of HHEX knockdown efficacy

We systematically validated the establishment of the KD group at both experimental and analytical levels. In the KD group’s larval midgut, qPCR analysis revealed a significant reduction in HHEX expression compared with the Ctrl group (Fig. 1b). Phenotypic observations demonstrated a marked shortening of midgut length in the knockdown group (Fig. 1c,d), while overall larval body weight remained unaffected (Fig. 1e). These results indicate that HHEX EC knockdown specifically impairs midgut development without compromising essential intestinal functions.

Consistent with these findings, single-cell transcriptomic analysis showed significantly reduced HHEX expression in ECs annotated within the KD group dataset, aligning with experimental validation of knockdown efficacy (Fig. 1f).

Quality control of single-cell transcriptomic data

We employed the GAL4-UAS system to knock down HHEX in Drosophila midgut enterocytes, investigating its role in intestinal development and function. Midguts from third-instar larvae of Ctrl group and KD group strains were dissected and dissociated into single-cell suspensions. Libraries were prepared, with two technical replicates per biological group (Ctrl and KD). Raw sequencing data were processed through standard pipelines.

Following stringent quality control, we retained 23,100 high-quality cells from Ctrl group (11,209 and 11,891 cells per library) and 18,958 cells from KD group (7,876 and 11,082 cells per library). The mean UMI count per cell was 1,920, with 1,055 genes detected per cell on average (Fig. 2a,b). Technical reproducibility was validated through Pearson correlation analysis of average gene expression profiles (Fig. 2c). UMAP visualization revealed minimal batch effects between replicates, with substantial overlap between Ctrl and KD populations. However, distinct non-overlapping cell clusters emerged specifically in KD samples (Fig. 2d,e), suggesting potential HHEX-dependent transcriptional divergence.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Quality Control and Dataset Integration for Drosophila midgut scRNA-seq Datasets. (a) Boxplot distributions of detected genes, UMI counts and mitochondrial gene percentage across 4 sequencing libraries. Dashed lines indicate quality thresholds: genes > 100, UMIs > 200, pct_counts_mt < 8%. (b) Group-wise comparisons of cellular gene/UMI counts (Ctrl vs KD) using the same metrics as in (a). (c) Pearson correlation matrix of normalized expression profiles. Color intensity scales from red (high similarity) to blue (low similarity). (d) UMAP visualization of integrated datasets after batch-effect correction with scVI (see Methods). (e) UMAP projection of batch-corrected single-cell profiles. Cells are colored by sequencing library (Ctrl-1: n = 112,209; Ctrl-2: n = 11,891; KD-1: n = 7,876; KD-2: n = 11,082), with gray background indicating the combined cell distribution.

Single-cell transcriptomic annotation defines functional cell-type diversity in the Drosophila larval midgut

The Drosophila larval midgut single-cell transcriptomes were clustered into 32 initial groups, with clusters containing fewer than 10 cells in either experimental condition (KD group or Ctrl group) excluded to ensure analytical reliability. The remaining 31 robust clusters exhibited balanced cellular representation across experimental groups, as detailed in Table 3. Cluster annotation was performed based on cell-type-specific marker genes listed in Supplementary Table 1, enabling precise identification of distinct cellular subpopulations. UMAP visualization of these annotated clusters revealed clear segregation of cell types (Fig. 3a), with consistent topological organization between biological replicates.

Table 3 Cell cluster annotation and population distribution.
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Single-Cell Transcriptomic Atlas of Drosophila Larval Midgut. (a) UMAP visualization of 31 annotated cell clusters after batch-effect correction. Colors denote distinct cell types, including Adult Midgut Progenitors (AMPs), Enteroendocrine Cells (EEs), Enterocytes (ECs), HHEX-knockdown induced cells (HICs), Copper cells, Gastric caecum and Plasmatocytes, with colors indicating distinct cell types. (b) Dot plot of conserved marker genes across cell types (minimum 25% expression per cluster). Classic lineage markers are highlighted: AMPs (esg, Dl), EEs(pros), and ECs (Chs2, gas). Dot size indicates expression prevalence; color intensity reflects mean log-normalized counts. (c) UMAP projections showing expression patterns of key functional genes.

Annotation using cluster-specific marker genes cross-referenced with FlyBase and literature identified 31 midgut cell types, including MP, AMP/EE progenitors, AMPs-like, ECs which include 18 metabolically specialized ECs, two EEs, HHEX-KD Induced Cells (HICs), copper cells, and gastric caecum (Fig. 3b). The SNAIL family transcription factor Escargot (esg) and stem cell marker Delta (Dl) serves as a marker gene for AMPs in Drosophila44. Enteroendocrine cells (EEs) specifically express Prospero (Pros) and secrete diverse gut hormone peptides, including Allatostatins (AstA, AstB, AstC) and Tachykinin (Tk). Enterocytes (ECs) are characterized by the expression of Chitin synthase 2 (Chs2), gustatory receptor (gas), and hydrolase-encoding genes such as Jon99Aii25,27.

Clusters 20 and 24 were classified as midgut progenitors (C20_MP_1 and C24_MP_2) based on selective expression of Dl (encoding a Notch ligand critical for progenitor maintenance) and absence of esg expression. Cluster 30, designated C30_AMP/EE_progenitors, exhibited co-expression of Dl and Pros, a transcription factor required for enteroendocrine cell specification, suggesting lineage priming toward enteroendocrine differentiation. Two distinct AMPs-like populations (C08_AMPs-like_1 and C26_AMPs-like_2) were identified in Clusters 8 and 26, both maintaining esg and Sox100B expression45,46,47. Notably, Cluster 26 specifically expressed Chs2 (chitin synthase 2), indicating functional divergence from other AMPs-like clusters.

Clusters 5 and 15 displayed co-expression patterns of stemness markers (esg, Sox100B), enterocyte marker gas, and proliferation marker PCNA, suggestive of transitional states during differentiation from midgut progenitors to enterocytes25,48. These observations led us to tentatively classify these clusters as putative transitional populations (C05_MP > ECs_1 and C15_MP > ECs_2) (Fig. 3c).

Two EE subtypes were resolved: Cluster 18 (pros+/AstA+), and Cluster 29 (pros+/vn+), which additionally expressing stress-response genes eEF1alpha2 and Hsp68, and exhibited a reduced proportion in the KD group (Fig. 4a,b).

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Cell Population Changes in Larval Midgut. (a) Stacked bar plot showing KD (cluster-matched colors) vs Ctrl (gray) proportions within each cell type, calculated as KD cells in X / total cells in X and Ctrl cells in X / total cells in X. Bars sorted by descending KD proportion values. Exact proportions (0.00–1.00 scale) shown above bars (two decimal places). X-axis: annotated cell types; Y-axis: proportion. (b) Violin plots showing log2FC for abundance in KD group. Sorted high to low. Dashed line at 0. log2FC > 0: KD abundance increase; log2FC < 0: KD abundance decrease. Colors match cell types. X-axis: annotated cell types; Y-axis: log2FC.

Region-specific populations included Copper cells (Cluster 25, CAH1+/Vha100-4+) localized to the mid-midgut and Gastric caecum (Cluster 28, Acbp4+) in the anterior region (Fig. 3c). Metabolic specialization divided ECs into anterior populations (Clusters 12, 19, 28) expressing Jon65Ai and alphaTry, posterior groups (Cluster 14,27) marked by iotaTry and zetaTry, detoxification-focused ECs (Cluster 7) enriched in Aldh, and chitin-producing ECs (Clusters 4, 11) with Chs2 expression. Within this functional landscape, Cluster 22 emerged as a ubiquitin ligase-enriched EC subtype exhibiting distinctively high E(spl)m6-BFM expression levels.

Clusters C16 and C17 were characterized by pronounced expression of Hemolectin (Hml) and Hemese (He), two established markers for plasmatocytes and hemocytes (Fig. 3c)49,50,51. Based on these distinct transcriptional profiles, we annotated these clusters as C16_Plasmatocytes_1 and C17_Plasmatocytes_2. Notably, eater (et), a hemocyte-specific gene known to be involved in phagocytosis of Gram-positive bacteria and hemocyte adhesion to sessile niches, was further enriched in C1652,53,54. This observation might suggest that C16 abundance is reduced in the KD group compared with the Ctrl group (Figs. 3b, 4a,b).

Clusters C00, C02, and C13 were observed to exhibit elevated expression of Antp and pb, two Hox family transcription factors implicated in midgut development and endoderm-derived cell differentiation, beyond their established roles in embryonic body patterning (Fig. 3c). Intriguingly, the human homolog of pb (HOXA2) has been reported to be upregulated in inflammatory bowel disease, while Antp’s human homolog (HOXA7) has been linked to colorectal cancer metastasis55,56. These parallels raise the possibility that pb and Antp might potentially contribute to midgut inflammatory regulation. Notably, the co-occurrence of Myo81F (a myosin family member hypothesized to facilitate collective cell migration via cytoskeletal remodeling) with the stemness marker Dl in in Cluster 13 as a potential precursor to Cluster 00 and 02 (Fig. 3c). Previous studies have demonstrated that upon infection or damage to ECs, the midgut epithelium undergoes a marked expansion of small progenitors expressing the Dl22,57,58,59. Considering their increased abundance in KD group (Fig. 4a,b) and partial retention of stem-like properties (e.g., Dl expression), these clusters were provisionally designated as HHEX-KD induced cells (C00-HICs_1, C02-HICs_2, C13-HICs_3), emerging specifically following HHEX knockdown.