Enhancing crayfish sex identification with Kolmogorov-Arnold networks and stacked autoencoders

Atilkan, Yasin; Kirik, Berk; Acikbas, Eren Tuna; Ekinci, Fatih; Acici, Koray; Asuroglu, Tunc; Benzer, Recep; Guzel, Mehmet Serdar; Benzer, Semra

doi:10.1038/s41598-025-34095-z

Download PDF

Article
Open access
Published: 30 December 2025

Enhancing crayfish sex identification with Kolmogorov-Arnold networks and stacked autoencoders

Yasin Atilkan¹,
Berk Kirik²,
Eren Tuna Acikbas³,
Fatih Ekinci⁴,
Koray Acici^1,4,
Tunc Asuroglu^5,6,
Recep Benzer⁷,
Mehmet Serdar Guzel^4,8 &
…
Semra Benzer⁹

Scientific Reports volume 16, Article number: 3971 (2026) Cite this article

1093 Accesses
Metrics details

Subjects

Abstract

Crayfish play an important role in freshwater ecosystems, and sex classification is crucial for analyzing their demographic structures. This study performed binary classification using traditional machine learning and deep learning models on tabular and image datasets with an imbalanced class distribution. For tabular classification, features related to crayfish weight and size were used. Missing values were handled using different methods to create various datasets. Kolmogorov-Arnold networks demonstrated the best performance across all metrics, achieving accuracy rates between 95 and 100%. Image data were generated by combining at least five images of each crayfish. Autoencoders were employed to extract meaningful features. In experiments conducted on these extracted features, support vector machines achieved 84% accuracy, and multilayer perceptrons achieved 82% accuracy, outperforming other models. To enhance performance, a novel architecture based on stacked autoencoders was proposed. While some models experienced performance declines, Kolmogorov-Arnold networks showed an average improvement of 3.5% across all metrics, maintaining the highest accuracy. To statistically evaluate performance differences, McNemar’s and Wilcoxon tests were applied. The results confirmed significant differences between Kolmogorov-Arnold networks, support vector machines, multilayer perceptrons, and naive Bayes. In conclusion, this study highlights the effectiveness of deep learning and machine learning models in crayfish sex classification and provides a significant example of hybrid artificial intelligence models incorporating autoencoders.

Modeling and predicting meat yield and growth performance using morphological features of narrow-clawed crayfish with machine learning techniques

Article Open access 09 August 2024

Lightweight detection and segmentation of crayfish parts using an improved YOLOv11n segmentation model

Article Open access 15 July 2025

Attention-enhanced and integrated deep learning approach for fishing vessel classification based on multiple features

Article Open access 13 March 2025

Introduction

Crayfish are organisms that play a significant role in freshwater ecosystems and are used as biological indicators¹. Various species of crayfish generally belong to the Malacostraca class, which includes terrestrial organisms that have adapted to live underwater². These organisms are crucial in assessing water quality and the health of the ecosystem because their bodies are sensitive to environmental changes, providing essential information about water quality³.

The narrow-clawed crayfish (Astacus leptodactylus Eschscholtz, 1823), also described as synonymous with Pontastacus leptodactylus Eschscholtz, 1823⁴, is Turkey’s only significant freshwater crayfish species and is considered one of the most valuable and economically important freshwater crayfish in Europe^4,5.

The cleanliness of the environment in which crayfish live is directly related to the health and presence of these species. In clean water ecosystems, the presence and health of crayfish indicate that the chemical and physical properties of the water are in good condition⁶. Due to their sensitivity to water pollution, a decline in water quality or ecosystem degradation leads to noticeable changes in the number and health of these organisms. Therefore, monitoring crayfish and assessing their health is an important indicator of ecosystem cleanliness and sustainability⁷. The population dynamics and health status of crayfish provide valuable data for shaping ecosystem management and conservation strategies⁸.

Sex determination in crayfish is important for three main reasons. The first is to understand the reproductive cycles and demographic structures of crayfish populations⁹; the second is to determine fishing strategies and management of crayfish species¹⁰; and finally, for the systematic classification and taxonomic identification of crayfish species¹¹. Sex determination is critical for understanding the reproductive cycles and demographic structures of crayfish populations. During breeding seasons, males and females may exhibit different behaviors, reproductive strategies, and habitat use; therefore, accurate sex determination is essential for reproductive management and understanding population dynamics⁹. Sex determination in crayfish is also important for optimizing fishing strategies and management practices for commercially valuable crayfish species. Protecting female crayfish during breeding seasons is particularly crucial for sustainable fishing practices¹⁰. Additionally, sex determination is important in the systematic classification and taxonomic studies of crayfish species. Specifically, identifying new species and determining sexual characteristics contribute to understanding biological diversity¹¹.

Deep learning and machine learning algorithms have become powerful tools for solving complex problems, revolutionizing many scientific fields in recent years. These technologies, especially when working with large datasets, offer the ability to perform more accurate, faster, and more efficient analyses by surpassing the limitations of traditional methods¹². Machine learning, and particularly deep learning algorithms, have been successfully applied in various domains such as image recognition, natural language processing, and genetic analysis, paving the way for discoveries and innovations in these fields^13,14. One of the biggest advantages of these methods is their ability to extract meaningful patterns and features from large datasets without the need for human intervention¹². As a result, it has become possible to analyze complex data in disciplines such as biology, medicine, and engineering, leading to more accurate predictions. In biological research, in particular, deep learning and machine learning techniques have led to groundbreaking advancements in areas such as species identification¹⁵, sex determination¹⁶, disease diagnosis¹⁷, and the identification of genetic variations¹⁸. These techniques eliminate the challenges and limitations of traditional methods, enabling the analysis of more complex and large datasets. For example, deep learning applications in image recognition are used to distinguish various species and subspecies, contributing to a better understanding of biodiversity.

Deep learning and machine learning also hold great potential in fisheries research, such as in the sex determination of crayfish^19,20. These algorithms accelerate the process of automatically identifying and analyzing complex sex characteristics, offering significant advantages in both scientific studies and commercial applications. These technologies are considered revolutionary tools for obtaining critical information necessary for the conservation of biodiversity, management of aquatic ecosystems, and sustainable fishing practices²⁰.

Studies on sex determination and species identification in crayfish and other aquatic products highlight the importance of deep learning and machine learning algorithms. For instance, Atilkan et al. (2024) compared deep learning and canonical machine learning models using weight, size, and sex data of healthy and diseased crayfish, along with images, achieving the highest accuracy by combining ResNet50 and RF algorithms¹⁷. Hasan and Siregar (2021) successfully identified the species, sex, and age of marine crayfish in Indonesia using computer vision techniques²¹. Ye et al. (2023) developed an automated sorting system that classified crayfish size and maturity with 98.8% accuracy using an improved YOLOv5 algorithm²². Garabaghi et al. (2022) used a support vector machine (SVM) algorithm to classify healthy and unhealthy freshwater crayfish, evaluating the performance of the SVM model with various kernel functions¹⁹. Wang et al. (2022) developed a convolutional neural network (CNN)-based system for assessing the freshness of crayfish²³, while Favaro et al. (2021) explored the potential of support vector machines for detecting the presence of white-clawed crayfish²⁴. Chen et al. (2024) improved the SSD model with MobileNetv3 and used the Soft-NMS technique to develop a method for detecting crayfish heads, tails, and claws in real time with high accuracy and speed²⁵. Li et al. (2022) applied deep learning in aquatic products for image detection, video detection, species classification, biomass estimation, behavior analysis, and food safety²⁰. Zhang et al. (2020) achieved 97.9% accuracy in detecting sea cucumbers (120 samples) using deep learning (Stochastic Gradient Descent (SGD))²⁶. Borowicz et al. (2019) developed a system for recognizing whale species in aerial images using deep-learning models²⁷. Eickholt et al. (2020) trained deep learning models to automatically identify fish species, thus enabling more effective monitoring and management of fish populations²⁸. These studies demonstrate the high accuracy and efficiency of deep learning techniques in sex determination and species classification of crayfish.

These studies and findings emphasize that sex determination in crayfish is not only biologically and ecologically important but also critical from an economic and management perspective. Accurate sex determination plays a fundamental role in understanding the reproductive cycles and demographic structures of crayfish populations, contributing to the optimization of reproductive management and population dynamics. Additionally, the protection of female crayfish during their breeding seasons is necessary to improve the fishing strategies and resource management of commercial crayfish species. In this context, this study aims to achieve sex determination in crayfish using deep learning methods. It is anticipated that deep learning technologies will provide significant advantages in both scientific and commercial applications by making this determination faster, more accurate, and more efficient.

Although machine learning algorithms perform well in classification tasks, several studies have aimed to enhance their performance by modifying key components, combining different classifiers, or employing alternative architectures such as Transformers instead of conventional deep learning models. For example, Kim et al. (2024) proposed a method called Heterogeneous Random Forest, which enhances the diversity — a key strength of the algorithm — to further improve its performance²⁹. Nanni et al. (2023) conducted a promising study in the field of medical classification by combining convolutional neural networks with support vector machines through ensemble techniques to achieve improved performance³⁰. Xie et al. (2025) proposed a two-stage framework called GAdaBoost, based on the AdaBoost algorithm, to address the label noise problem in classification tasks. The proposed method demonstrated strong performance in terms of robustness and efficiency³¹. Lu et al. (2025) proposed LRAD-ViT, a Vision Transformer–based model for Alzheimer’s disease detection, showing strong diagnostic performance and high computational efficiency³². Lu et al. (2025) proposed LAFAN-Net, a deep learning framework for tuberculosis and pneumonia diagnosis that integrates visual and textual information. The model effectively extracts clinically meaningful features, demonstrating its potential for improving diagnostic accuracy in chest X-ray analysis³³. Lu et al. (2025) proposed CTBViT, a Vision Transformer–based model for tuberculosis classification that focuses on the most relevant image regions while effectively mitigating the overfitting problem³⁴.

In this study, we aimed to compare both traditional and recently introduced classification methods for the crayfish sex identification problem using tabular and image-based datasets.

For the binary classification task, conventional machine learning algorithms, including Naïve Bayes, Support Vector Machines, Random Forest, K-Nearest Neighbors, and Artificial Neural Networks, were employed. In addition, a recently proposed method, the Kolmogorov–Arnold Network (KAN), was incorporated to provide a comparative evaluation against these traditional approaches. Furthermore, in the image-based part of the study, autoencoder and stacked autoencoder architectures based on convolutional neural networks were utilized as feature extraction mechanisms, and their performances were systematically compared across the same classification models.

To the best of our knowledge, our study is the first to use Kolmogorov-Arnold networks and autoencoders for sex classification in crayfish. Additionally, a unique feature extraction mechanism was developed by utilizing multiple autoencoders, and this architecture has significantly improved performance in Kolmogorov-Arnold networks, though not in all models.

In the other parts of the study, Sect. 2 provides information on data acquisition, statistical properties of the data, data preprocessing, machine learning models, the deep learning model, and autoencoders. Section 3 presents the evaluation metrics of the experiments, statistical tests, experimental setup, and the results of the experiments and tests. In Sect. 4, the results are interpreted, and potential future studies are discussed. Additionally, the Appendix details the search space used in hyperparameter optimization and the selected hyperparameters.

Materials and methods

Image dataset

Individuals of the species Pontastacus leptodactylus Eschscholtz, 1823 were obtained from local fishermen during the 2017 and 2018 fishing seasons in Eğirdir Lake, Beyşehir Lake, and Hirfanlı Lake. In this study, a total of 112 crayfish were examined, including 62 females and 50 males³⁵. The specimens were transported to the laboratory for measurements such as weight (W), carapace length (CL), carapace width (Cw), abdomen length (AL), abdomen width (Aw), cheliped length (ChlL), cheliped width (Chw), and cheliped height (ChL). Additionally, the sex of the specimens was determined, and after the organism was inverted, at least five images were taken from both the top and bottom and recorded according to standard measurement specifications. A total of 1,277 images were used in the research. The sex of the crayfish was determined by examining specific anatomical features such as reproductive organs (gonopores, size, and shape of the abdomen, claspers, coloration, and size)³⁶.

In the tabular dataset, the class ratio among the total of 112 samples was calculated as 0.806:1. First, to ensure a balanced evaluation of the dataset, the data was shuffled according to the 10-fold cross-validation method, ensuring a balanced class distribution in each fold. Using this method, the distribution of female and male samples presented in Table 1 was obtained.

Table 1 Female - Male count distribution for each fold.

Subjects

Abstract

Similar content being viewed by others

Modeling and predicting meat yield and growth performance using morphological features of narrow-clawed crayfish with machine learning techniques

Lightweight detection and segmentation of crayfish parts using an improved YOLOv11n segmentation model

Attention-enhanced and integrated deep learning approach for fishing vessel classification based on multiple features

Introduction

Materials and methods

Image dataset

General framework

Canonical machine learning methods

Stacked convolutional autoencoders

Kolmogorov Arnold networks

Results

Evaluation metrics

Statistical tests

Wilcoxon test

McNemar’s test

Experimental setup

Experimental results

Discussion

Conclusion

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links