Introduction

Soil-transmitted helminth infections represent the most prevalent neglected tropical diseases globally, imposing a significant disease burden in low- and middle-income countries, particularly among school-aged children1,2. The predominant STHs include Ascaris lumbricoides, Trichuris trichiura, hookworm, and Strongyloides stercoralis3,4. These infections, associated with poor hygiene, contaminated food or water consumption, and low socioeconomic status4,5. The World Health Organization estimates that approximately 1.5Ā billion people are infected with STHs4. The highest prevalence of STH infections occurs in sub-Saharan Africa, the Americas, South Asia, and Southeast Asia4,6,7. In Thailand, despite decreasing trends in STH infection rates resulting from disease surveillance and control policies outlined in the Thai Ministry of Public Health roadmap 2017–2026, these infections continue to persist across various regions of the country8,9. The nationwide helminthiases survey conducted in Thailand in 2019 across all age groups revealed an overall prevalence of helminthic infections of 9.79%, with over 14 species identified. Hookworms demonstrated the highest prevalence (4.47%). The southern region of Thailand reported the highest hookworm infection rates, ranging from 8.06 to 23.08% across the ten southern provinces. This nationwide study provided geographical distribution at the provincial level and generated predicted prevalence maps for hookworms using an inverse distance weighting spatial interpolation technique8. While predicted prevalence maps of STH infections are essential tools for guiding public health interventions, their limitations necessitate careful interpretation and consideration of additional data and methodologies to enhance accuracy and applicability. Despite national-level mapping, detailed province-specific studies are crucial, as risk factors and transmission patterns may vary significantly between localities. Previous research in Nopphitam District, Nakhon Si Thammarat Province, southern Thailand, documented an overall STH prevalence of approximately 11.0% among primary schoolchildren, with hookworm predominating at 10.7%, followed by T. trichiura at 0.3%10. This investigation examined personal, environmental, and behavioral risk factors through a structured questionnaire, revealing that older age groups of primary schoolchildren (10–12 years) and inadequate handwashing before meals were significantly associated with hookworm infections. This study provided supporting spatial data characterizing the study area as having a distinct tropical hot and humid climate, which is suitable for the proliferation of STHs. However, there remains a critical gap in research regarding the development of risk area mapping for southern Thailand, particularly in relation to ongoing STH transmission. Geographic and environmental conditions significantly influence the spatial distribution of parasites. Advanced tools such as geographic information systems (GIS) and satellite imagery enable researchers to map high-risk areas and predict disease transmission patterns8,11,12,13. Recent studies have applied sophisticated machine learning methods, including random forest algorithms, ANNs, and CNNs, to enhance the analysis of environmental risk factors in remote or inaccessible regions. These approaches facilitate more effective parasite and/or vector surveillance and inform the development of targeted control strategies11,14,15.

Our study addresses a critical research gap by examining STH infection prevalence and risk factors among school-aged children in Thasala District, Nakhon Si Thammarat Province, southern Thailand. Our innovative approach integrates traditional parasitological methods with satellite imagery analysis to characterize diverse land-use patterns including agricultural, residential, forest, aquatic, and built environments. By correlating specific environmental contexts with infection data, we identify high-risk ecological niches facilitating parasite transmission. This methodology provides a cost-effective framework for targeted interventions in resource-limited settings without requiring extensive field surveys. Our findings enhance understanding of STH transmission dynamics in southern Thailand and offer a reproducible model for predicting infection hotspots applicable to similar endemic regions globally.

Methods and materials

Study area

A cross-sectional study was conducted from June to August 2024 in primary schools in Thasala District, Nakhon Si Thammarat Province, covering an area of approximately 375.53 square kilometers. Thasala is a coastal district along the Gulf of Thailand and is divided into 10 sub-districts. The topography consists of both sandy and clay soils, with most of the population engaged in agriculture16. In this study, six sub-districts were selected: Taling Chan, Klai, Thai Buri, Thasala, Moklan, and Don Tako. The selected areas were based on similar geographical characteristics: Klai and Thasala are adjacent to the sea; Taling Chan and Thai Buri are on similar longitudes with Klai and Thasala, respectively, so they can be compared; and Moklan and Don Tako are non-landlocked. This selection allowed for geographical diversity while ensuring that some sub-districts shared similar characteristics, making the study comprehensive and spatially balanced as show in Fig.Ā 1.

Fig. 1
figure 1

Sources: Esri, TomTom, Garmin, FAO, NOAA, USGS, OpenStreetMap contributors, and the GIS User Community. For more information about Esri software, please visit http://www.esri.com. All other layers were produced by the authors and are copyright-free.

The study area, which includes six sub-districts in Thasala District, Nakhon Si Thammarat Province: (1) Taling Chan, (2) Klai, (3) Thai Buri, (4) Thasala, (5) Moklan, and (6) Don Tako. Map was modified from Wikipedia Commons: https://en.wikipedia.org/wiki/Tha_Sala_district and ArcGIS software by Esri.

Sampling and sample size

A simple random sampling method from selected six sub-districts was used. Children, who had not received anthelminthics in the three months before attending the study, were included for screening of STHs infections. The sample size was calculated using the formula for estimating a finite population proportion17. Based on a prevalence rate (p) of 16%10, with a 95% confidence interval (z = 1.96) and a margin of error of 5% (d = 0.05), the estimated number of students aged 6–12 years was 10,71118. To allow for a drop-out rate of 25%, 254 individuals were required. However, a total of 319 participants had returned the assent and informed consent forms, so we included all individuals in the study.

Stool sample collection and STHs detection

A clean, labeled stool collection container with a scoop attached inside was distributed to all students who consented to participate a few days prior to specimen collection. On the day of container distribution, we provided verbal instructions using age-appropriate language and supplied documentation with clear illustrated protocol suitable for their age group. Students were instructed to take the containers home and collect stool samples according to the provided guidelines. They were directed to collect approximately 5–6Ā g of their morning stool specimen using the provided scoop and place it in the container, avoiding contamination with urine, water, or soil. To ensure proper handling, parents or guardians were informed about the study through written consent forms and were requested to assist their children with the collection process when necessary, following the enclosed instructions. The students then returned the samples to their schools, where our staff collected and transported them to the Medical Technology Laboratory at Walailak University for intestinal parasite detection.

We employed two highly accurate laboratory techniques to maximize the detection of parasites: FECT and APC. The FECT is highly sensitive, capable of detecting minimal quantities of parasites, particularly helminth eggs and larvae. The APC method cultivates parasites on agar plates to observe nematode growth, thereby enhancing the sensitivity for detecting S. stercoralis and/or hookworm infections. The preparation of stool samples and parasite detection protocols were conducted as previously described19,20.

Questionnaire survey

The questionnaire was administered by both the children and their parents, depending on the child’s ability to read and understand the questionnaire. For children who were unable to read, typically younger children in grades 1–3, a parent or guardian was instructed to complete the questionnaire on their behalf. This ensured that data collection remained accurate and comprehensive, regardless of the child’s ability to read. The questionnaire, adapted from previous research, was designed to evaluate various factors potentially associated with parasitic infections10. It included questions covering demographic information (gender and grade level), personal hygiene practices (handwashing, vegetable washing, and consumption of thoroughly cooked food), environmental behaviors (barefoot walking, toilet usage, and filtered water consumption), and risk behaviors (finger biting, interaction with pets, and nail trimming).

Statistical analysis of STHs infections and risk factors

Demographic data were presented using percentages and participant counts. The prevalence of STHs infections calculated by the number of infected children divided by a total of sample collection at time point multiply one hundred. The association between STHs infections and related factors was examined using Fisher’s Exact Test. Statistical test was considered significant at a p-value of less than 0.05. All analyses were performed using STATA package version 10.1 (StataCorp LLC, College Station, TX, USA).

Development of a predictive model to identify high-risk areas

Our study developed an innovative predictive model that integrates CNNs for land-use classification with ANNs that process PCA-refined data (Fig.Ā 2). We employed CNNs for land-use classification using publicly available datasets from the USGS National Map Urban Area Imagery collection (21 classes, 100 images per class at 256 × 256 pixels)21. Model robustness was enhanced through image augmentation techniques including 90-degree rotations, averaging filters, noise introduction, and sharpening. We evaluated three CNN architectures22namely custom-designed CNN, ResNet50, and DenseNet121, using consistent hyperparameters and assessed their performance through accuracy, precision, recall, and F1-score metrics. The best-performing model was exported with optimized parameters to establish a foundation for the next step. Using a pre-validated classification model, we analyzed 30 satellite images from six schools (5 images per school) (Fig.Ā 3), extracting unique architectural and locational parameters. This approach demonstrates how remote sensing and machine learning can enhance educational infrastructure analysis. All images were standardized to a resolution of 256 × 256 pixels to maintain uniformity in spatial resolution and facilitate comparative analyses across the dataset. To correlate satellite image parameters with STH prevalence, we integrated PCA with ANN. Our field survey-confirmed prevalence data and school parameters were combined and expanded using low variance noise. PCA reduced dimensionality (0.95 explained variance), with principal components feeding into an ANN for prevalence prediction. For area scanning, we established precise latitude-longitude boundaries with optimized step sizes to ensure comprehensive coverage. The well-trained CNN and ANN models worked collaboratively to classify regions, capturing both detailed features and broader patterns in satellite imagery. The resulting risk map was overlaid on original images, providing visual representation of spatial variations to support informed decision-making.

Fig. 2
figure 2

Four-stage satellite imagery analysis for STH transmission risk mapping. Our methodology includes: (a) CNN model development using UC Merced Land Use Dataset with image preprocessing; (b) Application of trained model to classify land-use patterns surrounding six schools in Thasala District; (c) Construction of a predictive model integrating parasitological diagnosis results with school environmental parameters; (d) Development of a geographic risk prediction system that generates spatial risk distribution visualizations based on coordinates.

Fig. 3
figure 3

Representative satellite imagery of six school campuses across different sub-districts in Thasala District used for predictive STH infection risk modeling: (a) Thai Buri, (b) Thasala, (c) Taling Chan, (d) Klai, (e) Moklan, and (f) Don Tako sub-districts. The satellite imagery was obtained and visualized using Folium version 0.19.5 (https://python-visualization.github.io/folium/) in the Python programming environment.

Ethics declarations

The study’s protocol was approved by the Ethics Committee in Human Research Walailak University (Approval No: WUEC-24-177-01). The study followed the Declaration of Helsinki. The study’s purpose and procedures were explained to the participants prior to enrolment. Written informed consent was obtained from all participants and the parent or the legal guardian of the child before the study onset. All study participants infected with STHs were treated with mebendazole or ivermectin.

Results

Participant recruitment and prevalence of STHs infections

Stool samples and questionnaires were collected from children aged 6–12 years across six sub-districts: Taling Chan, Klai, Thai Buri, Thasala, Moklan, and Don Tako. A total of 917 children were randomly selected for participation, and 319 provided both stool samples and completed questionnaires, following assent and informed consent procedures. The number of participants from each sub-district is detailed in TableĀ 1.

Table 1 Number of participants recruited and prevalence of STH infections by sub-district.

Among the 319 samples analyzed using the FECT and APC methods, 31 cases of SIHs infections were detected, representing an overall prevalence of 9.72%. Mono-infections were more common than mixed infections, with T. trichiura being the most frequently detected parasite (5.02%), followed by hookworm (3.49%). Mixed infections were less prevalent, accounting for 1.25% of cases. The most frequently observed co-infection was hookworm and T. trichiura (0.94%), followed by hookworm and S. stercoralis (0.31%) (TableĀ 2).

Table 2 Frequency of parasitic infections using FECT and APC methods (n = 319).

Association between STHs infections and risk factors

An analysis of 319 schoolchildren revealed no significant associations between STH infections and gender, grade level, handwashing practices, food hygiene, walking barefoot, defecation practices, or drinking water source (p > 0.05). However, not keeping nails short was significantly associated with infection (16.18% vs. 7.97%, p = 0.047). Playing with pets was also associated with a higher infection (13.82% vs. 7.14%), nearing statistical significance (p = 0.05) (TableĀ 3).

Table 3 Analysis of the association between STHs infections and related factors (n = 319).

Predictive spatial risk distribution map

A comparative analysis was conducted to evaluate the performance of three different model architectures using an augmented training dataset of publicly available datasets. A detailed comparison of performance metrics (TableĀ 4) reveals that our customized CNN demonstrated the lowest performance among the evaluated models. In contrast, DenseNet121 emerged as the top-performing model, showing its superior capability in feature extraction and learning from the enriched data. This finding suggests that DenseNet121’s inherent architectural design, characterized by its densely connected layers, is particularly effective at capitalizing on the additional information provided by the augmented training dataset.

Table 4 Performance metrics of each classification model.

Principal component analysis was employed to reduce the dimensionality of the dataset from 21 parameters to 4 principal components, while retaining 95% of the total variance (Fig.Ā 4a). This reduction not only streamlined the model complexity but also mitigated issues related to multicollinearity among input variables. Subsequently, an ANN was trained to regress the prevalence percentage, as evidenced by the training loss curve (Fig.Ā 4b). The refinement of the model was further validated through testing, with the total sum-square error calculated as 4.06e-4, highlighting the model’s precision and effectiveness in prediction. This integrated PCA-ANN methodology proves effective for processing high-dimensional environmental data while maintaining robust predictive performance in estimating STH infection prevalence. To assess the spatial accuracy of our model’s predictions, a comparative visualization of STH prevalence was generated (Fig.Ā 5). The heat map derived from empirical data collected in rural school settings provides direct evidence of localized infection patterns (Fig.Ā 5a). When applied to the study area, our predictive model effectively replicates these observed spatial distributions (Fig.Ā 5b), further validating the model’s accuracy in estimating prevalence while enabling identification of high-risk areas.

Fig. 4
figure 4

Dimensionality reduction and model training performance for STH infection risk prediction. (a) Cumulative explained variance ratio against number of principal components, showing that 4 components capture 95% of total variance. (b) Training loss curve of the ANN model demonstrating rapid convergence within 20 epochs.

Fig. 5
figure 5

Comparative heat maps of STH prevalence with white circles indicating geographic locations on latitude-longitude coordinates. (a) Observed STH prevalence based on field data from six sub-districts in Thasala District: (1) Taling Chan, (2) Klai, (3) Thai Buri, (4) Thasala, (5) Moklan, and (6) Don Tako. (b) Predicted STH prevalence generated by the CNN-PCA-ANN model incorporating environmental and spatial features. The color gradient ranges from blue (low prevalence) to red (high prevalence), with the color bar showing prevalence percentages for direct comparison between observed and predicted patterns.

Discussion

Our study explored the prevalence, associated risk factors, and predictive spatial risk distribution map of STH infections among school-aged children in Thasala District, Nakhon Si Thammarat Province, Thailand. This represents the first such report from southern Thailand in five years. With an overall prevalence of 9.72%, the findings confirm that STH infections remain a public health concern in rural southern Thailand, especially among children who are more vulnerable due to frequent exposure to contaminated soil and poor hygiene practices. The most detected parasite was T. trichiura, followed by hookworm. This infection pattern contrasts with previous studies in the southern region, which identified hookworm as the dominant species8,10. The success of previous deworming programs targeting hookworms but potentially less effective against T. trichiura might explain this shift in species predominance. Furthermore, the detection methods employed in this study may have enhanced sensitivity for T. trichiura compared to previous investigations in the region23. These findings demonstrate the dynamic nature of STH epidemiology and highlight the need for time- and location-specific assessments to develop effective control strategies. Interestingly, among the six sub-districts examined in this study, Thasala sub-district exhibited the highest prevalence of STH infections, with T. trichiura being the predominant species detected. The elevated prevalence of T. trichiura in this area may be attributed to several interconnected factors. First, parasite-related factors include the reduced efficacy of albendazole against T. trichiura compared to other STH species24which may contribute to persistent infections in areas where this anthelmintic has been previously administered as part of deworming programs. Second, environmental factors such as the forested and humid characteristics of the Thasala area may provide optimal conditions for the survival and development of T. trichiura infective stages, where the combination of adequate moisture, shade, and suitable soil conditions creates an environment conducive to parasite persistence and transmission. Third, host behavioral factors may include behavioral patterns specific to children in this area, such as frequent play activities in shaded, moisture-rich environments where T. trichiura eggs can survive and mature. Behavioral analysis revealed that most hygiene-related practices, including handwashing, food hygiene, and wearing shoes, were not significantly associated with STH infections in this study. While this contrasts with findings from several studies that underscore the importance of hand hygiene and safe food practices25,26,27the discrepancy may reflect cultural differences in hygiene routines, reporting bias in self-administered questionnaires, or the relatively small sample size, which limits statistical power. Nonetheless, the study did identify one significant behavioral risk factor: not cutting nails was associated with a markedly higher infection rate. This supports earlier findings that personal hygiene, particularly nail care, and plays a critical role in interrupting fecal-oral transmission pathways28. Additionally, playing with pets was associated with higher infection prevalence. Although not conclusive, this suggests that zoonotic transmission or exposure to contaminated environments via animal contact could contribute to hookworm risk in rural communities. Educational campaigns focused on hygiene and sanitation, as well as routine deworming programs, could further mitigate infection risks, as advocated by WHO guidelines on STH control in school-aged children29. Our study developed an innovative predictive model for STH infection risk by integrating CNN-based satellite imagery classification with ANN analysis after PCA dimensionality reduction. Using the Land-Use Scene Classification database, we found that DenseNet121 significantly outperformed traditional CNN architectures with augmented training data. Its densely connected design proved particularly effective at extracting features from enriched datasets30. Satellite images from the six schools were standardized to 256 × 256 pixels, effectively balancing structural detail with computational efficiency. This standardization approach facilitated robust comparative analyses across diverse school environments31. Reducing the 21-parameter dataset to four principal components via PCA, while retaining 95% of the variance, effectively addressed multicollinearity and streamlined the subsequent regression analysis. The low sum-square error observed during ANN testing demonstrates the efficacy of this PCA-ANN approach in predictive modeling32. This integrated analytic framework enabled the identification of specific environmental features, particularly higher proportions of agricultural land and closer proximity to water bodies, that were positively associated with elevated STH prevalence. The strong correlation between predicted and observed patterns validates our methodology’s practical utility for targeted public health interventions.

Our study has certain limitations. First, the use of single-day stool sample collection may have reduced diagnostic sensitivity, likely leading to an underestimation of the true prevalence of STHs infections. Collecting samples over multiple days would improve detection accuracy33. Second, the relatively small sample size may limit the generalizability of the findings. Additionally, the analysis did not separate risk factors by individual STH species, despite their differing modes of transmission. Future studies should consider species-specific analyses to better understand the distinct behavioral and environmental factors associated with each type of infection. A larger sample in future studies would enhance the statistical power of the results. Third, while our approach successfully generates spatial risk distribution maps, translating findings into targeted interventions remains challenging. Future work should implement back-analysis34SHAP-based feature importance35and sensitivity analysis36 to identify key environmental factors associated with STH prevalence. Model validation through targeted seasonal parasitological surveys in predicted high-risk areas is also essential. Furthermore, integrating GIS with our CNN-PCA-ANN approach would enhance our methodology by enabling real-time data collection, improving prediction accuracy, and facilitating locally-tailored interventions.

Conclusion

This study confirms STH infections remain a public health challenge among schoolchildren in rural southern Thailand, with an overall prevalence of 9.72%. Trichuris trichiura and hookworm were the predominant parasites, with mono-infections occurring more frequently than mixed infections. Poor hygiene practices, particularly not cutting nails, were significantly associated with infection risk. Our innovative predictive model integrating CNN-based land-use classification with ANN processing of PCA-refined data generated precise spatial risk distribution maps for STH infection. Comparative analyses between empirical heat maps and model outputs validated this approach. These findings demonstrate the effectiveness of combining satellite imagery analysis with machine learning for targeting public health interventions. To reduce STH burden, strengthening school-based hygiene education, improving sanitation, and implementing environmental control strategies guided by this geographic data will be essential.