Multilingual identification of nuanced dimensions of hope speech in social media texts

Sidorov, Grigori; Balouchzahi, Fazlourrahman; Ramos, Luis; Gómez-Adorno, Helena; Gelbukh, Alexander

doi:10.1038/s41598-025-10683-x

Download PDF

Article
Open access
Published: 23 July 2025

Multilingual identification of nuanced dimensions of hope speech in social media texts

Grigori Sidorov¹^na1,
Fazlourrahman Balouchzahi¹^na1,
Luis Ramos¹,
Helena Gómez-Adorno² &
…
Alexander Gelbukh¹

Scientific Reports volume 15, Article number: 26783 (2025) Cite this article

1346 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Hope plays a crucial role in human psychology and well-being, yet its expression and detection across languages remain underexplored in natural language processing (NLP). This study presents MIND-HOPE, the first-ever multiclass hope speech detection datasets for Spanish and German, collected from Twitter. The annotated dataset comprise 19,183 Spanish tweets and 21,043 German tweets, categorized into four classes: Generalized Hope, Realistic Hope, Unrealistic Hope, and Not Hope. The paper also provides a comprehensive review of existing hope speech datasets and detection techniques, and conducts a comparative evaluation of traditional machine learning, deep learning, and transformer-based approaches. Experimental results, obtained using 5-fold cross-validation, show that monolingual transformer models (e.g., bert-base-german-dbmdz-uncased and bert-base-spanish-wwm-uncased) consistently outperform multilingual models (e.g., mBERT, XLM-RoBERTa) in both binary and multiclass hope detection tasks. These findings underscore the value of language-specific fine-tuning for nuanced affective computing tasks. This study advances sentiment analysis by addressing a novel and underrepresented affective dimension-hope, and proposes robust multilingual benchmarks for future research. Theoretically, it contributes to a deeper understanding of hope as a complex emotional state with practical implications for mental health monitoring, social well-being analysis, and positive content recommendation in online spaces. By modeling hope across languages and categories, this research opens new directions in affective NLP and cross-cultural computational social science.

Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis

Article Open access 01 July 2025

Analyzing hope speech from psycholinguistic and emotional perspectives

Article Open access 09 October 2024

Multilingual hope speech detection from tweets using transfer learning models

Article Open access 15 March 2025

Introduction

Hope is a powerful emotion or state of mind that significantly impacts human psychology and our ability to cope with life’s challenges¹. Often associated with optimism, hope is characterized by an expectation of desired outcomes and the eventual fulfillment of desires or goals². Researchers categorize hope broadly into two main types: “generalized” and “particularized” hope. These categories differ based on their targeted goals, the underlying cognitive and emotional processes, and the resulting behaviors^3,4,5.

Generalized hope refers to a sense of positive future developments but is not necessarily linked to a specific goal/outcome, object, or quality; rather, it fosters a broad optimism about the future⁶. It plays a crucial role in building resilience. Generalized hope helps individuals maintain a general positive outlook in the face of adversity, encouraging them to persevere through challenges even when the specific goal is unclear⁷. For example, the sentence “Despite the challenges we face, I remain hopeful that brighter days are ahead of us” is an example of generalized hope because it does not specify a particular goal or outcome. Instead, it expresses a broad, positive outlook toward the future, regardless of current difficulties. This kind of hope is not tied to a specific event or result but rather a belief in overall improvement and positivity. It reflects resilience and the ability to maintain hopefulness in the face of adversity, which are key aspects of generalized hope.

While research on generalized hope is limited, Maretha⁸ and O’Hara et al.⁶ define particularized hope as a hope that focuses on achieving or acquiring a specific goal or outcome. They explore this type of hope as a combination of personal emotions and mental activities related to desire. They argue that particularized hope involves a feeling of encouragement fueled by desire and the formation of expectations, which can be either realistic or unrealistic.

Realistic hope focuses on achieving a probable outcome. This type of hope involves mental imagery and an assessment of the likelihood of occurrence to maintain a connection with reality⁷. For example, a student who studied hard all week might think, “I believe I can pass the exam,” due to their realistic hope.

On the other hand, unrealistic hope is based on incomplete or inaccurate information and the expectation of improbable events⁹. An example could be a student with poor grades who thinks, “Everyone thinks I failed, but a miracle happens every day”. Different types of hope can lead to varied behaviors¹. Particularized hope, centered on achieving a specific goal, often motivates individuals to engage in targeted actions to reach that goal. This type of hope can drive a person to plan and execute steps that directly contribute to the desired outcome, like a student studying diligently for an exam they aim to pass. The specificity and clarity of the goal in particularized hope can provide a clear direction for action.

Distinguishing between different hope types, such as goal-oriented “Particularized” hope and broader “Generalized” hope, provides a multifaceted perspective on human behavior. This understanding allows us to predict behaviors (specific hope motivates action, while generalized hope fosters resilience)¹⁰, guide therapy (realistic hope can support interventions), promote well-being (understanding how hope types contribute to mental health is crucial)¹⁰, and inform personal growth (recognizing realistic hope helps in setting achievable goals and avoiding disappointment)¹.

Over the past few years, several natural language processing (NLP) tasks, such as sentiment analysis, emotion detection, hate speech identification, abusive language detection, misogyny detection¹¹, and more, have gained prominence for studying social media content and online interactions¹². However, despite the significance of hope in human psychology and decision-making, only a limited number of studies have been conducted to explore the various types of hopes and expectations shared on social media^13,14. Hope speech detection as an NLP task was introduced by Chakravarthi¹⁴ as a binary classification task that categorizes YouTube comments - written in English and code-mixed Dravidian languages - regarding conflicts between India and Pakistan into two predefined categories: “Hope,” representing positivity and support, and “Not Hope,” encompassing neutral or offensive comments. This definition provided a fundamental understanding of hope, further refined by Balouchzahi et al.¹³. In their work, Balouchzahi et al.¹³ developed a multiclass hope speech detection dataset that classifies English tweets into four categories: “Generalized Hope”, “Realistic Hope”, “Unrealistic Hope”, and “Not Hope”. The definitions and some samples for each label are provided in Sect. “Annotation guidelines”.

Inspired by¹³, and to promote multiclass hope speech detection across various languages, we expand the PolyHope dataset tailored to Spanish and German. This expansion is underpinned by several compelling reasons, marking a pivotal stride in fostering multiclass hope speech detection across linguistic boundaries.

Firstly, by incorporating Spanish, acknowledged as the second most widely spoken language by native speakers globally¹⁵, and German, a predominant language in Europe¹⁶, we enrich the dataset’s diversity. This addition facilitates the encompassment of diverse cultural nuances in hope expression, thereby augmenting the dataset’s representativeness.

Moreover, this extension addresses a notable void in research resources about hope speech detection in Spanish and German. By providing datasets tailored to these languages, researchers can now delve into the intricacies of hope expression within these linguistic contexts, fostering deeper insights and more accurate analyses.

To the best of our knowledge, this is the first study to create multiclass hope speech datasets for both Spanish and German, building upon and significantly extending the original PolyHope framework developed for English. Unlike previous studies that focus primarily on binary classification or English-only datasets, our work introduces multilingual, fine-grained annotations that account for cultural and linguistic variation in hope expression.

The main contributions of this research paper are listed below:

Development of first-ever multiclass hope speech detection datasets for Spanish and German languages,
A systematic review of existing datasets and techniques on hope speech detection,
Comparative analysis of different learning approaches for hope speech detection for different languages,
Comparative analysis of multilingual transformers for multilingual hope speech detection.

Related work

Hope speech detection is a recently emerged field of research; in this section, we present a review of different proposals that address binary and multiclass classification datasets and techniques. Additionally, we examine monolingual, cross-lingual, and multilingual paradigms. The main purpose of this review is to give an account of what has been done up to now and point out where more work should be done in hope speech detection.

Hope speech detection datasets

Table 1 provides a summary of all available hope speech datasets along with their key characteristics. A detailed description of each dataset is presented in the following sections.

The HopeEDI dataset, as introduced by Chakravarthi¹⁴, was the first multilingual resource for detecting hope speech within the context of Equality, Diversity, and Inclusion. It comprises user-generated comments from YouTube, with 28,451 comments in English, 20,198 in Tamil, and 10,705 in Malayalam, each manually annotated to indicate the presence or absence of hope speech as a binary classification task. This dataset uniquely emphasizes content that is encouraging, positive, and supportive, making it the first to address hope speech in a multilingual framework. Subsequent work by Chakravarthi¹⁴, García-Baena et al.¹⁷ and Hande et al.¹⁸ expanded the dataset to encompass Spanish and code-mixed Kannada-English binary hope speech detection datasets respectively.

García-Baena et al.¹⁷ created a balanced dataset of 1,650 Spanish LGBT-related tweets, labelled for Hope Speech and Non-Hope Speech. In baseline experiments, they used traditional ML algorithms (SVM, Multinomial Naive Bayes (MNB), LR) and DL models (Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM)) with various features (TF-IDF, word embeddings, BERT embeddings). They reported that the fine-tuned BETO model achieved the best performance with an F1-score of 85.12%.

Hande et al.¹⁸ introduce a dataset called KanHope, a dataset for detecting hope speech in code-mixed Kannada-English texts collected from YouTube. The proposal seeks to address the lack of enough studies on spreading positivity through content such as hope speeches in languages that do not have many resources allocated towards them. To achieve this, the researchers proposed DC-BERT4HOPE, a dual-channel BERT-based model that uses both original code-mixed data as well as its English translation and achieves the highest performance with a Weighted F1-score of 0.752.

Nath et al.¹⁹ presented the first dataset developed specifically to identify hope speech in the Bengali language. Deep learning (DL) models such as CNNs and BiLSTMs and transformers models such as Bangla BERT and Multilingual BERT were evaluated using this dataset. The best performance came from MuRIL, which is a transformer model specifically designed for multiple Indian languages based on BERT, reaching an Average Macro F1-score of 0.8498.

Malik et al.²⁰ addressed Multi-lingual Hope Speech Detection in English and Russian using transfer learning with a fine-tuning approach. It compares joint multilingual and translation-based methods, with the latter translating all content into one language for classification. The study presents a Russian corpus of YouTube comments, and their experiments using bilingual datasets show that the proposed framework outperforms baselines, with the translation-based approach achieving the best performance at 94% in accuracy and an 80.24% in F1-score.

Balouchzahi et al.¹³ presented an innovative method to recognize and classify hope speech in English tweets by creating a corpus called ’PolyHope’, which has two levels of categorization. The first level distinguishes between ’Hope’ and ’Not Hope’, while at the second level, hope speech is further classified into Generalized Hope, Realistic Hope, and Unrealistic Hope. Various models were used, such as traditional machine learning (ML) models, DL models, or transformer models like BERT, among others; according to their experiment results, transformers performed better than the other models, reaching a Macro F1-score of 0.85 in binary classification and 0.72 in multiclass classification.

Later, Balouchzahi et al.²¹ set out to detect hope and hopelessness in Urdu tweets by creating the first Urdu annotated dataset with five classes, namely Generalized Hope, Unrealistic Hope, Realistic Hope, Not Hope and Hopelessness. Different ML models such as Logistic Regression, BiLSTM, Convolutional Neural Networks, and transformer models (BERT and RoBERTa) have been benchmarked in this paper. It was observed that while simpler algorithms like LR worked best when it came to binary classification, reaching an Average Macro F1-score of 0.7593, transformers outperformed them at more fine-grained multi-class labeling, reaching an Average Macro F1-score of 0.4801, with RoBERTa.

Table 1 Summary of Hope speech detection datasets in the literature.

Subjects

Abstract

Similar content being viewed by others

Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis

Analyzing hope speech from psycholinguistic and emotional perspectives

Multilingual hope speech detection from tweets using transfer learning models

Introduction

Related work

Hope speech detection datasets

Hope speech detection techniques

Dataset development

Data collection and processing

Annotation guidelines

Inter-annotator agreement (IAA)

Statistics of the datasets

Benchmarks

Preprocessing

Traditional machine learning models

Deep learning models

Convolutional neural network

Bidirectional long short-term memory

Transformers

Results

Traditional machine learning models

Deep learning models

Transformers

Findings and analysis

Class-wise performance analysis

Limitations

Conclusion and future work

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links