Abstract
Mental health challenges among Indian farmers are a critical yet under reviewed public health problem, especially in rural areas where access to men’s health professionals is limited. Stress from crop failure, fluctuating prices, debt, and poor social support often lead to deterioration in well-being. Traditional survey methods have been used to measure stress but are limited by manual interpretation, subjectivity, and lack of scalability for rural deployment. This study proposes a diagnostic model based on a convolutional neural network (CNN) that analyses the spoken responses of farmers to a structured questionnaire that focuses on stress levels, coping mechanisms, and social support. The audio responses of 350 farmers were collected in local languages, converted into spectrograms, and processed through a CNN architecture, selected for its ability to learn spatial hierarchies without manual feature engineering. The research objectives were (i) to develop a scalable voice-based CNN model for mental health assessment and (ii) to validate its usability in rural contexts. The hypotheses tested were that (H1) CNN would classify mental health status with high precision and (H2) the system would demonstrate strong usability. The results confirmed high predictive accuracy (99.67%) and strong performance in six usability factors: learnability, efficiency, configurability, satisfaction, understandability and effectiveness, indicating the feasibility of the model for integration into rural healthcare outreach.
Introduction
India is a huge country where more than 151 million people depend on agriculture for their livelihood. Approximately 60% of the Indian population works in this sector and contributes approximately 18% to India’s GDP1. After the green revolution in the 1960s and 1970s, a drastic change was observed in Indian agriculture practices, where rich people gained power and wealth at the expense of farmers2. In the 1990s, liberalization and globalization also affected Indian agricultural policies, where the open market was prioritized, which had an adverse effect on small farmers due to the entry of international companies into the agriculture sector3. In 1990, policies were revised where public investment was reduced in agriculture, leading to farmer suicide and poor mental health3. Due to these reasons, the farming population reduced to 32% at the end of 2022, and farmers continued to suffer losses4. Indian farmers are also not very educated, and due to all these issues, they are not capable enough to present their schemes for their benefits in front of the government, and some political parties also try to take benefits from this. As a result, Indian farmers are suffering from mental distress, depression, and anxiety. The suicide rate is also increasing among Indian farmers. According to data from the National Crime Records Bureau (NCRB), more than 358,164 farmers committed suicide between 1995 and 2019, and the condition is still the same5. The report shows that in India the farmer suicide rate is 1.5 times higher than the national average6. These facts show that in India the conditions of farmers are not as good, and some actions must be taken to reduce this suicide rate and improve their mental health. This study seeks to explore several questions related to the development of an intelligent model to examine the mental health of farmers in India. It aims to understand the primary mental health challenges faced by Indian farmers and to investigate whether vocal and speech-based characteristics can effectively serve to detect mental health conditions. Another important area of inquiry is how this digital model can be integrated into rural healthcare systems to enable mental health screening. The study also considers the usability and accessibility challenges of deploying this technology in rural settings associated with the use of AI for mental health.
To overcome these shortcomings, the present study aimed to introduce a novel deep learning–based diagnostic model to assess the mental health of Indian farmers by analyzing their spoken responses to a structured questionnaire. The farmer responses collected in the vernacular are converted into spectrogram representations and then analyzed using CNN. We use CNNs instead of traditional machine learning models because of their ability to capture spatial hierarchies from the inputs (spectrogram) automatically without manual feature engineering. The purpose of the paper is to develop and validate a scalable and user-friendly CNN architecture for mental health conditions for Indian farmers with a focus on stress levels, coping mechanisms, and social support.
Research Hypotheses:
H1
The farmer voice-based CNN model can perform with high accuracy in classifying mental health status compared to traditional approaches.
H2
The system will exhibit good usability, allowing farmers to perform tasks efficiently and satisfactorily.
To test these hypotheses, a structured questionnaire was constructed that included questions about stress, coping strategies, and social support. Information was collected from 350 farmers through face-to-face communication in the field. The audio data set is used to train and test the proposed model. In addition to predictive performance, the validation of the model was structured according to six usability aspects: learnability, efficiency, configurability, satisfaction, understandability, and effectiveness. This two-fold approach (i.e., focus on technical performance and user-driven validation) guarantees that the model is not only precise but also practical for rural use.
The contribution of this work, thus, lies in that it offers (i) a CNN-based mental health assessment model designed for the agricultural community, and (ii) a usability validation framework based on the evidence, which shows the feasibility of adapting the model in rural health outreach programs. By addressing the gaps in availability and diagnostic reliability, this study provides a replicable model to facilitate intervention and policy measures aimed at improving farmer well-being in India.
This paper has 5 sections. Section “Related work” focuses on the related work, whereas Sect. “Problem statement and solution” discusses the problem statement and its possible solution. Section “Proposed methodology” describes the proposed work and analyzes the results, and Sect. “Results and discussions” focuses on the conclusion of this research work along with the future scope.
Related work
Reasons for poor mental health of Indian farmers
Agriculture is the pillar of the Indian economy and employs more than 151 million people, almost 60% of the total population. Although socioeconomically significant, Indian farmers suffer from extreme mental illness driven by a host of socioeconomic, environmental, and policy-induced causes. These encompass chronic financial indebtedness, fluctuating crop production due to capricious climate patterns, weak support mechanisms from the government, fluctuating market prices, and recurrent experience of natural disasters in the form of droughts, floods, and cyclones1. In several instances, such stressors are compounded by social pressures, ignorance, and the lack of community-focused mental health services. Consequently, mental health in farmers is a greatly overlooked aspect of public health care in rural India.
Agrarian policies have historically deeply influenced the mental well-being of farmers. Such was the 1960s Green Revolution, which vastly increased production in agriculture but benefited rich farmers disproportionately. This policy unintentionally drove small farmers to the margins through its increased economic disparities, creating sources of mental health stressors2. Similarly, the economic liberalization and globalization policies undertaken in the 1990s exposed India to international competition, exposed domestic farmers to price fluctuation, de-subsidization, and reduced bargaining power3. These changes increase operational risk for marginal farmers and increase current exposures, creating conditions conducive to psychological misery.
The structural weaknesses of the agrarian economy were starkly revealed in 2018 when frustration turned into mass farmer protests. Around 200 farmers protested in Delhi, Mumbai and other cities, demanding improved MSP and special parliamentary debates over agrarian distress. These protests soon spread across the states and sometimes took violent turns and led to injuries and deaths, including suicides, highlighting the extent of attendant mental health concerns6. Statistical analysis indicates that the Indian farmer suicide rate is 47% higher than the national rate. Almost 48 suicides are registered each day, and about 16,500 suicides occur each year7 .
The COVID-19 pandemic further intensified these strains, causing unprecedented shocks through the value chain in agriculture. Farmers struggled with acute labour shortages, reduced access to seeds and fertilizers, reduced transport logistics, and reduced market access. Financial relief from government agencies tended to be delayed or inadequate, worsening desperation and hopelessness in the agricultural community. Suicides among farm workers increased by 18% in 2020 over other years, as the compounding effects of economic insecurity and health crises took their toll. These figures underscore the compounding stress effects faced by farmers from the nexus between policy failure, economic instability, and health crises.
Present state of research in Indian farmers’ mental health
There is growing academic focus on the mental well-being of Indian farmers owing to increasing rates of suicide and widespread psychosocial stress in agricultural populations. In 2018, an economic study registered 13,754 suicides in 2012 and cited underlying socioeconomic causes, including fragmentation of land, crop failure, and cycles of indebtedness10 . Kallakuri et al12.examined the mental well-being in 12 Andhra Pradesh villages and cited rates of anxiety (10.8%), depression (14.4%), and suicidal thoughts (3.5%). These statistics highlight the necessity to incorporate mental health services in the primary care system in rural areas.
A 2019 overarching systematic review collated key risk indicators such as environmental fluctuation, insecure sources of income, chronic illness, and isolation from institutional support networks11. In a seminal study on South Indian farmers, more than 90% of respondents exhibited symptoms of depression, highlighting the magnitude of mental distress in the countryside areas12. A 2020 study in Maharashtra also converged with the findings, revealing anxiety and sleeplessness in 55% of the sample and somatic symptoms in 34.7%. The mean mental health index scored 0.58 with a standard deviation of 0.49, reflecting far-reaching psychological impairment13.
Chitrasena et al.1614 underscored the fact that declining market returns and declining commodity prices have exacerbated the psychological loads on farmer households. A meta-analysis of 92 peer-reviewed papers documented a range of methodological styles toward mental health measurement—quantitative surveying, ethnographic case studies, empirical field research, and mixed-method designs—underscoring the dynamic interplay between stressors in the environment and mental health15. The evidence clearly shows that mental health in rural India cannot be considered in solitary terms but needs to be recognized as a multidimensional construct driven by determinants in the form of structural, economic, and environmental elements.
The COVID-19 pandemic exacerbated all this, leading to chain reactions in the form of agrarian social and financial crises and mental ones. Various post-pandemic research studies have documented the multi-faceted effects with a focus on both immediate and residual mental health effects16. A targeted 2022 review of 29 papers evaluated the link between disasters and the psychological effects in farmer populations. The evidence persistently showed elevated rates of post-traumatic stress, anxiety disorders, and depression in the wake of such incidents17. These effects necessitate the call for an interdisciplinary action from public health and social welfare and agricultural development.
As a response, technological interventions have also been considered as mitigating tools. Internet-based applications and online counselling portals have proven to be potentially scalable and accessible avenues for mental health assistance to farmers. These digital interventions are not only helping individual farmers but also empowering paramedical staff with decision-making support and remote monitoring systems18. In turn, international climate change—marked with enhanced temperatures, unpredictable monsoons, and depleting groundwater levels—has increased rural stress. Such environmental deterioration directly and indirectly influences mental health through physical exhaustion and crop losses and economic insecurity and pressure to migrate, respectively. Researchers promote multi-stakeholder interventions involving policy reforms, community mental health activities, government-run counselling services, and resilience-based models adjusted to agrarian contexts1920.
Problem statement and solution
From the literature review, it has been analyzed that there is a growing recognition of the need for mental health interventions for farmers in India, including counselling and support services. It is also worth mentioning that there is a growing trend for using digital technologies, such as mobile health (mHealth) and telemedicine. An intelligent computer-based digital model can be used to examine the mental health of the farmer, and this research area is currently in its early stages in India. Although there have been some studies and initiatives aimed at using digital technologies to improve access to mental health services for farmers in India, much more work is needed to fully understand the potential benefits and challenges of such models.
To provide solutions to the issues addressed above, research should be carried out on the development of intelligent computer-based models to provide efficient results to examine farmers’ mental health, which should be easy to use. This paper is an attempt in this direction, in which a CNN-based intelligent computer model is proposed to examine the mental health of the Indian farmer. Indian farmers are not very familiar with the latest technologies. So, data will be collected in the form of their audio while asking them questions. The questionnaire is prepared based on different factors that affect the mental health of farmers.
Intelligent computer-based digital model
The Intelligent Computer-Based Digital Model is proposed to provide an efficient assessment tool for farmers’ mental health. The model utilizes artificial intelligence and machine learning techniques to examine the mental health status of Indian farmers by processing audio data, considering different factors such as stress levels, coping mechanisms, and social support systems. The model can be used to monitor the well-being of farmers over time, providing early warning signs of potential mental health problems.
Convolutional neural network
CNN is a widely popular deep neural network that is mainly used in computer vision. There are different layers in CNN, and this approach outperforms with audio data sets. In the input layer, neurons are trained by the input data and perform the operations. The convolutional layer of the CNN architecture is responsible. After that, the pooling layer summarizes all the features generated by the previous layer. Just before the output layer, there is a fully connected layer, which is used to connect neurons between different layers 23. This basic architecture of CNN is shown in Fig. 1. In this work, vocal data from Indian farmers is captured while submitting the questionnaire, which works as input and is classified into seven different categories (anger, disgust, fear, happiness, sadness, astonishment, and neutral).
Reinforcement learning
This is an algorithm that works on the concept of maximization of reward points by taking the necessary actions by the intelligent agent2122. In this work, farmers are treated as agents, and the accuracy of the results along with the value of the usability factor is treated as reward points. This work proposed the application of Reinforcement Learning (RL) on the proposed model in case the accuracy of the result is low or the value of the usability factor is less than 0.5. Both conditions are fully satisfied in this work; however, we have cross-validated our model in terms of reinforcement learning mentioned in Sect. "Quantitative evaluation of the proposed model".
FMHA Action is required for updating the proposed model to examine farmer’s mental health (FMH) depending on the initial results.
FMHIR Initial results to examine farmer’s mental health and usability factor value.
FMHRP Reward points to validate the proposed model.
FMHP Policy decided for farmer’s mental health model according to FMHRP.
FMHQ Changes required in the strategy for updating the questionnaire.
Learning rate (β): 0.1–0.001.
Discount Factor (γ): 0–1.
Proposed methodology
The proposed iCDAM (Intelligent Computer-Based Digital Model) methodology to assess the mental health of Indian farmers is a multimodal audio processing framework that leverages deep learning for emotion-driven inference. The complete methodology is described in Fig. 2 and consists of the following phases:
Collect farmer’s requirements
Initially, Indian farmers’ requirements are gathered, for which a survey was conducted with Indian farmers of different geographical regions.
Identification of problem
After interacting with Indian farmers of different geographical regions, it was identified that there is a lack of communication between farmers and others (leaders, government, professional bodies, etc.). Indian farmers are not educated enough to be able to use the latest tools and technologies and communicate their problems efficiently. They need an efficient and intelligent computer model to share their opinions and thoughts. Therefore, we have decided to propose an intelligent model that can be trained using audio data from Indian farmers.
Vocal dataset collection using a questionnaire for training of the model
A questionnaire was prepared considering stress levels, coping mechanisms, social support systems, etc. Vocal data from Indian farmers was collected during the answering of the questions. Inclusion criteria were farmers (age 20–65 years) farming for at least the last five consecutive years, able to understand and converse in Hindi/local dialect and willing to participate and be audio recorded; while those who had a severe psychiatric diagnosis or disorders apart from hearing /speech impairments or were unwilling to record sessions were excluded. Data were collected in five local sessions, in which approximately 70 farmers participated after informed consent and a brief introduction to the objectives of the study. A pretested structured questionnaire based on a review of literature, modifications of the WHO well-being index, and experts’ opinions was used for data collection to assess stress (frequency, intensity and causes), coping (adaptive and maladaptive), and social support (family, peers, and society institutions). To eliminate literacy challenges and capture natural language use, participants’ responses were collected in audio files using handheld digital recorders (44.1 kHz), with all sessions equaling 40 min in length to have consistent recordings. Later we transformed them into spectrograms for CNN-based analysis, because such representation (spectrogram) is a generally used visualized form of speech emotion recognition that not only simplifies but facilitates the learning study within hierarchical space without manual feature extraction. As part of model training, participant feedback was used to assess the six usability constructs, including learnability, efficiency, effectiveness, configuration, satisfaction, and understandability, based on ISO 9241-11 Human–Computer Interaction standards to ensure that the system is not only technically robust but also operationally applicable in rural healthcare settings. Annotation of this audio dataset is done based on various factors present in audio responses, such as emotions, behavior, and average time delay. These categories are anger, disgust, fear, happiness, sadness, astonishment, and neutrality. The audio data collected are shown in Fig. 3.
The bar plot represents the amplitude distribution of audio samples collected from the farmers via questionnaire, showcasing signal variation across time frames. https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en.
Dataset for validation of the proposed model
This work uses a gender-dominated Hindi speech database known as Common Voice Hindi (Indian), which is assumed to be a subset of the Mozilla Common Voice corpus filtered and recorded specifically targeting the Indian accents. The data set consists of 2498 audio samples in MP3 format. The duration of each audio clip is short, on the order of 10 s, making the data set amenable for real-time or lightweight scenarios. Dataset link: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en
Data pre-processing and splitting
The following processing procedures were conducted for the reliability and generalization of our model: Firstly, noise filtering: A Butterworth bandpass filter (300 Hz–3.4 kHz) was applied. Second, Non-Speech Segmentation: First, segmentation of silence: The segments here and the silence periods were removed by applying the energy-based VAD method. Third, Segmentation: The audio files were segmented to have fixed-length windows (i.e., 2 s) with a 25% overlap to retain context. Finally, we normalized the amplitude of the signal with the energy characteristic extracted from the root mean square. Figure 4 shows the pre-processed data, displaying the original audio waveforms with their corresponding trimmed and normalized versions.
Spectrogram generation
The clean audio was converted to 2D logarithmic spectrograms using Window. Size: 512 samples, Hop Length: 256 samples, Number of Mel Bands: 128, Spectrogram Resolution: 224 × 224 pixels, and Library Used: Librosa (Python). Mathematically, an audio signal is converted into an audio signal using a short-time Fourier transform (STFT), and the calculations are represented through mathematical formulas:
where
These spectrograms were saved as image arrays used for the transferred CNN. Represented in Fig. 5.
In addition, the data set is divided into teach (training), confirm (validation), and test. The size of the training, validation, and test data sets was 70%, 15%, and 15%, respectively. The usability was evaluated along six dimensions: learnability (ratio of possible to overall tasks solved), efficiency (proportion of farmers who could solve the task within the time budget), effectiveness (average ratio over sessions solved), as well as configurability, satisfaction, and understandability (tuple ratings). For all usability scores, descriptive statistics (mean and standard deviation percentages) were calculated. Inferential analyses included chi-square tests to compare completion rates across demographic variables (e.g., age, education) and t-tests or analysis of variance (ANOVA) comparisons of efficiency by group. A significance level of p ¡ 0.05 was used. Data preprocessing and CNN training were performed in Python 3.11 using TensorFlow 2.15, Keras, and Librosa for CNN training; statistical analysis was performed using SPSS v26 and Python packages (scipy, statsmodels).
Working of the proposed model
In this section, we present the complete work for the classification of our proposed 2D-CNN model through the 13 layers presented in Table 1 and visualized in Fig. 6. Here we feed vocal spectrogram image data of size 128 × 128 × 3 (height, width, RGB channels) into the input layer of the proposed system. These spectrogram images are extracted from audio responses gathered using a farmer-interaction questionnaire system.
The methodology begins with identifying the farmer’s requirements and collecting vocal data through structured questionnaires. These audio responses are preprocessed using trimming, normalization, and spectrogram generation using the Mel scale. The resulting spectrogram images are resized to 128 × 128 × 3 for CNN compatibility.
These input spectrograms are passed through the first Conv2D layer, which generates a feature map of size 126 × 126 × 32. This layer uses 32 filters of size 3 × 3 and applies ReLU activation for non-linearity. The shape of the output data is computed using Eq. (2):
where \({OUT}_{H}\) and \({OUT}_{W}\) are output height and output weight.
h, w = input data size; p = padding size; fh = height of filter; s = no off stride; fw = weight of filter. This is followed by batch normalization, which stabilizes and accelerates the training process. Next, MaxPooling2D with a 2 × 2 filter and stride of 2 reduces the spatial dimensions to 63 × 63 × 32.
This stage is passed to the second Conv2D layer, which increases the depth to 64 while reducing the spatial size to 61 × 61. ReLU is applied for non-linearity, followed again by Batch Normalization and MaxPooling2D, resulting in a size of 30 × 30 × 64.
The process continues through a third Conv2D layer, expanding the depth to 128 and reducing the size to 28 × 28. Again, Batch Normalization and MaxPooling2D reduce the shape to 14 × 14 × 128. These progressive convolutions help capture deeper hierarchical features. The final feature map is flattened into a 1D vector of size 25,088, which is passed into a dense layer with 128 neurons and ReLU activation. To mitigate overfitting, dropout is applied with a rate of 0.5. The final dense layer uses a SoftMax activation function to classify the input into one of four categories (e.g., different levels or types of mental health states). The SoftMax function computes the probability distribution across the target classes, defined as:
where \({z}_{i}\) is the input score for class I, and n is the total number of classes.
The output values range between 0 and 1 and sum up to 1, allowing confident classification. Post-inference, the model’s predictions are validated using usability factors such as accuracy, interpretability, and consistency. If the validation score exceeds a predefined usability threshold (e.g., > 0.5), the model is deemed successfully validated. Otherwise, the system uses reinforcement learning to refine the questionnaire, iteratively improving the input data quality and model accuracy.
Usability factors to validate the proposed model
To validate the effectiveness of this model, a usability test is performed using six different usability factors. If the value of these factors is greater than 0.5, then we can say that the proposed model provides an effective solution to examine farmers’ mental health. Usability factors used for this validation are
Learnability
Learnability can be calculated by dividing the number of tasks a user can achieve by the total number of tasks22. Farmers are asked questions, and, during their responses, their audio responses are captured. This experiment was carried out on 350 farmers, of whom 337 farmers have answered all the questions (100%), 8 farmers have answered 94% of the questions, and the remaining 5 farmers have answered 92% of the answers. Learnability can be calculated as the average of these responses using the following equation.
From the above equation, the estimated value for learnability will be 95.33.
Efficiency
The model’s efficiency can be gauged by how long it takes farmers to answer. A total of 40 min was given to perform this task, where out of 350 farmers, 336 farmers answered all the questions in time, while the rest of the farmers did not answer all the questions within that time. The average efficiency of this proposed model can be taken as 0.96, or 96%.
Configurability
It can be calculated using the following equation24:
The proposed model can be configured in any environment and updated by adding or changing questions, so its configurability value is 1. The purpose of this project is to convey the theoretical adaptability of the model to different computing platforms since the core system is implemented using standard Python-based modules. However, the results should not be interpreted as empirical validation, as no multi-platform or cross-environment deployment was conducted.
Satisfaction
Satisfaction measures how many users are satisfied with the proposed model. 88% of farmers said the model was excellent; 7% said it was good and flexible. For the rest of the farmers, the proposed model needs some improvement. The general satisfaction level can be taken as 0.95, or 95%.
Understandability
The existence of meta-information (EMI) can be used to measure the understandability of any model. The value of EMI will be 1 if metadata exists; otherwise, its value will be 024. For helping farmers, all instructions were properly available as metadata, and personal assistance was also provided. In this case, the value of the EMI can be considered at 1.
Effectiveness
The percentage of tasks completed by farmers can be used to denote the efficacy of the proposed model24. An experiment was carried out in five separate sessions, and 70 farmers participated in each session. The number of farmers who completed 100% of the task in successive sessions was 64, 67, 62, 63, and 65. The following equation can be used to calculate the average value of effectiveness25.
In this equation N denotes the total number of sessions, whereas F denotes the number of users (farmers) in the respective session. Each session (x) result corresponding to the number of farmers (y) is denoted as xy.
Results and discussions
Qualitative result analysis
The proposed CNN model was trained using 70% of the available dataset to evaluate its effectiveness in detecting and classifying emotions related to farmers’ mental health. The remaining data was split for testing and validation purposes. The models were trained over 20 epochs. Training and validation of the graph are shown in Fig. 7.
The accuracy graph in Fig. 7a shows a rapid improvement in both training and validation accuracy during the first few epochs. Training accuracy reaches nearly 100%, and validation accuracy closely follows, peaking around 98–99% by epoch 10. This indicates that the model is highly effective in capturing patterns in the training data while maintaining strong performance on validation data, suggesting good generalization capability. However, there is minor fluctuation in validation accuracy after epoch 10, which might hint at early signs of overfitting, but the differences are relatively small. The model maintains high validation accuracy throughout the training process, demonstrating its robustness.
The loss graph in Fig. 7b complements the accuracy plot by showing a steep drop in training and validation loss initially. Training loss quickly converges near zero, and validation loss also drops significantly from above 5.0 to below 0.5 within the first few epochs. After epoch 5, both losses stabilize at low values. While the validation loss remains low, there are some small spikes, particularly after epoch 15. These could indicate moments of slight instability or noise in the validation data, but they do not significantly impact overall accuracy. The persistent low training loss and relatively flat validation loss trend suggest that the model is not severely overfitting. The model exhibits excellent training behaviour, achieving near-perfect accuracy with minimal loss. Validation metrics are consistently high, showing strong generalization to new data. Minor fluctuations in validation performance after several epochs are within acceptable limits and do not compromise the overall effectiveness of the model.
Upon completion of training, the model demonstrated strong generalization capabilities. It achieved a training accuracy of 99.67% with a validation accuracy of 99.73%, while maintaining a validation loss of just 10%, indicating minimal overfitting and high robustness.
The emotion-wise classification results are summarized in Table 2. The model performed consistently across all seven emotion categories—Anger, Astonishment, Disgust, Fear, Happiness, Neutral, and Sadness—achieving F1scores ranging from 0.9800 to 0.9950, with particularly high performance in detecting Happiness (F1-score: 0.9950) and Disgust (F1-score: 0.9900). Even the lowest-performing category, Fear, still achieved a high F1-score of 0.9800, which reflects the model’s effectiveness across varied emotional states.
The performance of the model is represented through the confusion matrix in Fig. 8, where the model demonstrates high precision and recall, as evidenced by the diagonal dominance in the confusion matrix. Most values lie on the diagonal, indicating that the model correctly classified most instances for each emotion. Specifically, happiness was predicted with perfect accuracy (100/100 correct), and disgust, neutral, and anger also show strong performance, with only 1 misclassification each. Fear, astonishment, and sadness had 2 misclassifications each but still retained high accuracy (98%). Anger was once confused with astonishment, and vice versa, possibly due to similar vocal intensities or expression patterns in the dataset. Fear and sadness were sometimes confused with each other, which is expected due to overlapping emotional cues. Despite these misclassifications, no class suffered from a major confusion pattern, and the errors appear to be minimal and randomly distributed rather than systematic.
These results highlight the model’s ability to accurately identify and differentiate between emotional states that are indicative of farmers’ mental well-being. The overall accuracy of 98.71%, combined with macro- and weighted averages above 98.70%, reflects a well-balanced classifier with minimal bias towards any emotion class. The high performance proves that the proposed CNN model is highly effective for use in emotion recognition applications, particularly in the context of monitoring mental health in agricultural communities. This could prove invaluable in developing scalable, technology-driven solutions for mental health issues among farmers.
Statistical validation of the proposed model using a t-test
A pairwise t-test was performed on the F1-score of each emotion class to statistically verify the consistency of the performance of the developed deep learning model across various emotion classes. Our goal was to determine if the differences in the performance of the models on different emotion classes are statistically significant or not.
The F1-scores for each respective class (in Table 3 and Fig. 9) varied between 0.9849 and 1.0000, with a macro-average of 0.9872 and a weighted average of 0.9872. A robust model was achieved for most in the classes, such as for Disgust, Neutral, Happiness, etc., with an F1 of 0.99 or above.
Through the boxplot, we can observe most of those F1-scores are linear across a scale either identical to or near the upper bound close to 1.0, with the exception of two classes (sadness and astonishment being just below that). This is tested statistically by employing a one-sample t-test with the perfect benchmark value of 1.0 for both class-wise F1-scores.
The p value of the t-test gives statistical evidence on how the model’s performance is consistent. The null hypothesis (H0) was that the class-wise F1-score did not deviate from the ideal score of 1.0, and the alternative hypothesis (H1) stated that there was a statistical deviation as compared to the ideal score. Based on the analysis, the p value obtained was greater than 0.05, indicating that the differences observed are not statistically significant. As a result, we fail to reject the null hypothesis. Additionally, effect sizes (Cohen’s d) across all classes were < 0.2, confirming negligible practical differences. Furthermore, 95% confidence intervals for class-wise F1-scores (0.982–1.000) demonstrated narrow variability and high reliability. This combined evidence shows that variations in F1-scores across emotion classes are statistically non-significant, with trivial effect sizes, thereby confirming that the proposed model is both robust and uniformly consistent across emotional categories.
Proposed model validation on usability factors
Six usability factors are used to validate the effectiveness of the purpose model. These factors scored significantly high with estimated values between 0.92 and 1.0, indicating that not only is the model valid, but it is also easy to use and stable. The finding indicates that the model is suitable for field applications, especially with the sensitive nature of testing the mental health of farmers. Satisfaction and understanding scored perfectly (1.0) in the usability test, which means that users considered the system highly intuitive and overall satisfying. Likewise, configurability and efficiency also achieved very high rates (about 0.96) because of its good support for smooth configuration and operational simplicity; all information can be easily transmitted. Usability values for learnability and satisfaction were high (around 0.95), which means that the system is very easily learned by new users.
The means of effectiveness got the worst score of usability (~ 0.92); it is, however, a level well above the lower limit (0.5), which means that the model is reliably successful in reaching its purposes. This small drop is likely an area for future optimization, such as with more training data or with how users perceive the impact of their interactions on the interface.
The high usability scores further corroborate the efficacy of the model not just in terms of prediction performance (99.67% accuracy) but also in terms of user interaction, trust, and practical utility. This finding establishes the model’s environmental readiness for deployment in real life, as in conditions of mental health monitoring, especially in rural and agro-based localities requiring intuitive and dependable instruments.
These results of the validation test are summarized in Table 4, and Fig. 10 provides its graphical representation.
Quantitative Evaluation of the proposed model
Speech Emotion Recognition (SER) is a publicly available dataset that is a combination of four different datasets (Crema, Savee, Tess, and Ravess)2633...Performance of the proposed 2D-CNN model developed to classify and monitor the mental health of farmers using spectrogram representations of vocal data. The experiment was carried out using the SAVEE dataset, which contains labeled emotional speech data. After training the model in the training subset, the final training accuracy reached 99.77%, with a validation accuracy of 99.73%, indicating the model’s strong generalizability. The validation loss was observed to be 10%, with minimal overfitting of the test and efficient learning. The testing phase further validated the model’s robustness. The test accuracy was recorded at approximately 99.12%, closely aligned with the training and validation accuracies. This consistency reinforces the effectiveness of the proposed CNN architecture in extracting deep emotional patterns from spectrograms. This results in an overall precision of 99.20%, a recall of 99.95%, and an F1 score of 99.07%, as mentioned in Table 5, which confirms the model’s exceptional ability to accurately detect emotional states across gender. Furthermore, training and validation accuracy and loss curves for 20 epochs are shown in Fig. 11. The graphs reflect steady learning with high accuracy and reduced loss over time. The validation curves closely follow the training curves, indicating minimal variance and a well-tuned model.
These results confirm the feasibility of the model for use in the real world in assessing the mental health of farmers through their vocal expressions. The integration of CNN with audio-derived spectrogram features has proven to be a powerful approach for emotion classification and mental state analysis.
The performance of the proposed model is also compared with some existing state-of-the-art on the Savee dataset which is given in Table 6.
Application of reinforcement learning for cross validation of the proposed model
To improve the performance and robustness of the CNN model, reinforcement learning (RL) was incorporated into the structure to achieve dynamic feature selection and classification decision-making. However, our best model using reinforcement learning (CNN + RL) achieved precision, recall, F1-score, and accuracy scores of 0.99639, 0.99652, 0.99634, and 0.99638, respectively. It is interesting to note that the same results were obtained on the baseline CNN model (not integrated with the reinforcement). This suggests that even though reinforcement learning may have adaptive learning advantages and holds promise for exploration–exploitation trade-offs, in this specific scenario it did not provide added value to the standalone CNN architecture. This result could be explained by the fact that the baseline performance of the CNN model is high, and thus only a small space for improvement can be achieved, or because the reward strategy in the RL module was not well suited for the task. However, these results argue that this CNN model is very strong and resilient to the target task, even without the need for reinforcement.
Conclusion and future scope
This research work aims to propose an intelligent computer-based digital model that uses a convolutional neural network to examine the mental health of Indian farmers, taking multiple factors into account. For this purpose, the audio responses of Indian farmers are used as a data source, which were captured during the answer to the questionnaire. The proposed model was able to provide the results with a high precision rate (99.67%). The proposed model was successfully validated based on a usability test for which six different factors learnability, efficiency, configurability, satisfaction, understandability, and effectiveness supported its suitability for rural settings that face literacy barriers and have limited access to mental health professionals who could manage the content of the survey in other countries, and the values of these factors show that the proposed model can be used as a promising tool to examine the mental health of Indian farmers. The proposed model was analyzed to outperform existing approaches in terms of accurate prediction and can provide a scalable, practical, and culturally fair solution for early detection.
In future work, it could be possible to expand the sample size to better analyze subgroups between genders, different age groups, and regional differences (limited in this study by an unbalanced data set). These findings could inform mobile applications and AI-based platforms that could be made available to young farmers reporting real-time support and resources, while hybrid algorithms, as well as longitudinal studies, can contribute to improving predictive precision by capturing long-term relationships between risk factors for agricultural stressors and mental health outcomes. However, impediments including low participation of women in sub-samples, a smaller number for age group-wise stratification, implementation cost concerns, and digital infrastructure and literacy among Indian farmers continue to pose challenges. It will be essential to address these through targeted awareness campaigns, user-friendly interfaces, and integration with national mental health initiatives. Training health care workers to use AI tools can help sustain deployment. With such advancements, these models can help not just early detection but also instantaneous decision support and empower the existing mental health ecosystem for agriculture.
Data availability
We have prepared a questionnaire considering stress levels, coping mechanisms, and social support systems for the farmers. Vocal data of Indian farmers were collected during the answering of the questions. For this work, 350 farmers were interacted of different geographical regions of India. Annotation of this audio dataset is done based on various factors present in audio responses such as emotions, behavior and average time delay, which is required for training the proposed model. We have provided the link for dataset used for validation purposes. Data Set Link: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en
References
Statista Research Department. Agricultural Sector in India—Statistics & Facts, Statista. Available: https://www.statista.com/topics/4868/agricultural-sector-in-india/#topicOverview (2024).
Ghatak, S. Agricultural and rural development in india: A rejoinder. ESocialSciences. http://www.esocialsciences.com/data/eSSResearchPapers/eSSWPArticle2009429121425.doc (2000).
Ghosh, J. The political economy of farmer suicides in india. Freedom from hunger Lecture Series, New Delhi, 2005.
Sen, P. Farmer Distress: Missing the Macroeconomic Factor. Seminar, vol. 713, no. 1, pp. 26–42, (2019). https://www.india-seminar.com/2019/713/713_pronab_sen.htm.
The World Bank. Farming in India is a family enterprise – and needs to be supported as such. The World Bank, Nov. 13, 2023. [Online]. Available: https://rb.gy/utirri.
Baba, U. India’s shocking farmer suicide epidemic. Aljazeera, May, vol. 18, 2015.
Agarwal, K. For the third time in three months, farmers to protest in delhi. The Wire, November. https://thewire.in/agriculture/third-time-three-months-farmers-protest-delhi, 2018.
Hossain, M. M. et al. Suicide of a farmer amid covid-19 in india: Perspectives on social determinants of sui-cidal behavior and prevention strategies. Center for Open Science, pp. 1–6 (2020).
Chitra, A. & Gopinath, R. A study on causes of stress to the farmers during covid-19 pandemic. Int. J. Aquatic Sci. 12(2), 773–782 (2021).
Jain, P. Suicides among farm workers rose last year. The Hindu, Oct. 30, 2021. [Online]. Available: https://www.thehindu.com/news/national/suicides-among-farm-workers-rose-last-year/article37235086.ece
Thakur, M. An economic analysis of plight of farmers suicide in india.
Kallakuri, S., Devarapalli, S., Tripathi, A. P., Patel, A. & Maulik, P. K. Common men-tal disorders and risk factors in rural india: baseline data from the smart mental health project. BJPsych open 4(4), 192–198 (2018).
Daghagh Yazd, S., Wheeler, S. A. & Zuo, A. Key risk factors affecting farmers’ men-tal health: A systematic review. Int. J. Environ. Res. Public Health. 16(23), 4849 2019.
Garg, K. Depression, suicidal ideation, and resilience among rural farmers. J. Neurosci. Rural Pract. 10(2), 175 (2019).
Bomble, P. & Lhungdim, H. Mental health status of farmers in maharashtra, india: A study from farmer suicide prone area of Vidarbha region. Clin. Epidemiol. Glob. Health. 8(3), 684–688 (2020).
Padhy, C. & Raju, P. S. Mental health of farmers-need of the hour. Int. J. Agric. Environ. Biotechnol. 13(1), 87–91 (2020).
Younker, T. & Radunovich, H. L. Farmer mental health interventions: a systematic review. Int. J. Environ. Res. Public Health. 19(1), 244 2021.
Mishra, D. & Satapathy, S. The pandemic covid-19 and its impact on indian agricul-tural sectors: An assessment of farmers. J. Glob. Inf. Manag. 30(4), 1–27 (2021).
Palmer, K. & Strong, R. Evaluating impacts from natural weather related disasters on farmers mental health worldwide. Adv. agric. dev. 3(1), 43–56 (2022).
Behere, P. B., Agarwal, S., Chowdhury, D., Behere, A. P. & Yadav, R. Challenges in community psychiatry–farmer suicides and their survivors. In Handbook on Optimiz-ing Patient Care in Psychiatry, pp. 198–203, Routledge (2022).
Balasubramanian, T. Global warming and health hazards to Indian farmers. J. Agrometeorol. 25(1), 92–97 (2023).
Kalpanadevi, D. Design and implementation of human-computer interface based cog-nitive model for examine the skill factor of students. In 2019 3rd International Confer-ence on Computing Methodologies and Communication (ICCMC), pp. 737–742, IEEE (2019).
Albawi, S., Waleed, J. & Abboud, A. J. Deep cnn-based-flower species recognition system. In 2023 3rd International Scientific Conference of Engineering Sciences (ISCES), pp. 54–58, IEEE (2023).
Alvaro, A., Almeida, E. & Meira, S. Quality attributes for a component quality mod-el. In 10th WCOP/19th ECCOP, Glasgow, Scotland, pp. 31–37 (2005).
A. P. Singh and P. Tomar, “Estimation of component reusability through reusability metrics,” International Journal of Computer, Electrical, Automation, Control and In-formation Engineering, vol. 8, no. 11, pp. 1965
Chitre, N., Bhorade, N., Topale, P., Ramteke, J. & Gajbhiye, C. Speech emotion recognition to assist autistic children. In 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), pp. 983–990, IEEE (2022).
Ottoni, L. T. C., Ottoni, A. L. C. & Cerqueira, J. d. J. F. A deep learning approach for speech emotion recognition optimization using meta-learning. Electronics 2(23), 4859 2023.
Panda, S. K., Jena, A. K., Panda, M. R. & Panda, S. Speech emotion using multimodal feature fusion with machine learning approach. Multimed. Tools Appl. 82(27), 42763–42781 (2023).
Rathod, V. et al. Improved remote mental health illness assessment and detection using facial emotion detection and speech emotion detection. Int. J. Health Sci. 6(2), 9577–9590 (2022).
Gummula, R., Arumugam, V. & Aranganathan, A. Facial emotion recognition using enhanced multi-verse optimizer method. Int. J. Electr. Comput. Eng. 14(2), 1519 (2024).
Kothuri, S. R. & Rajalakshmi, N. R. A hybrid feature selection model for emotion recognition using shuffled frog leaping algorithm (sfla)-incremental wrapper-based subset feature selection (iwss). Indian J. Comput. Sci. Eng. 13(2), 354–364 (2022).
Singh, P., Srivastava, R., Rana, K. & Kumar, V. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl.-Based Syst. 229, 107316 (2021).
Sun, C. Fundamental q-learning algorithm in finding optimal policy. In 2017 Interna-tional Conference on Smart Grid and Electrical Automation (ICSGEA), pp. 243–246, IEEE (2017).
Funding
Open access funding provided by Manipal University Jaipur.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors state that they have no known competing interests or personal connections that might have influenced the research presented in this publication.
Ethical approval
It is important to note that all procedures utilized in studies involving human participants were performed in accordance with the ethical principles of the institutional and/or national research committees, as well as with the 1964 Helsinki Declaration and its subsequent amendments or equivalent ethical principles. This article does not contain any studies with animals performed by any of the authors.
Consent to participants
Informed consent was obtained from all participants.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Agarwal, J., Sharma, S., Madan, P. et al. Computer intelligence based model for mental health detection among Indian farming communities. Sci Rep 15, 37872 (2025). https://doi.org/10.1038/s41598-025-21724-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-21724-w











