Introduction

The emergence of a global pandemic and its widespread transmission has had far-reaching consequences across various domains, including social, economic, political, and virtual landscapes. As the crisis has rendered billions of people unable to safely leave their homes or interact physically, there has been a parallel surge in human interactions within cyberspaces. This unprecedented level of human-computer interaction in virtual spaces has never been seen before in history, opening up new avenues for investigation. Consequently, comprehensive analyses are needed to understand these interactions so that relevant stakeholders can formulate strategies and implement effective policies based on the insights derived from these analyses. In this context, we conduct mixed-method research to examine the patterns inherent in the changes in human sentiments expressed online in response to COVID-19. This work is based on our prior dissertation work, “Bimodal longitudinal investigation on changes in sentiments over social media interactions owing to COVID-19 pandemic”, completed at Bangladesh University of Engineering and Technology, 20221. Although the primary analysis and experiments remain unchanged, we have adjusted the narrative to fit the format and readership of the journal, facilitating greater visibility and impact within the research community.

Motivation

Social media have become a valuable source of data for scientists, as social media houses a large amount of raw user data mostly in textual form along with the image, audio, and video formats that get updated daily. Popular social media have been a tremendous source of information with its rapid information dissemination, where many individuals express their opinions through social media platforms2. During catastrophes and disasters, social media has played an increasingly essential role and has emerged as an important alternative information channel to traditional media in the last five years, ranking as the fourth most popular source of emergency information3. Sentiment analysis is pivotal in terms of understanding people’s perceptions and helps in decision-making4. There have been a lot of recent studies mentioned in the related work section that apply sentiment analysis and topic modeling methods to social media data to gain insights for various purposes. These analyses often result in the revelation of new findings. For example, fear and anxiety are two feelings that people report immediately over social media after the occurrence of an earthquake, and the feeling of calmness and unpleasantness are not expressed as clearly during small earthquakes but are after huge tremors5. The negative sentiment positively affects the forwarding volume, spreading depth, and network influence of information dissemination compared with positive sentiment6. Thus, in a global pandemic situation, a possible study direction may be to use the findings of psychological and sociological studies on people’s behavior (e.g., hope, fear, etc.) which could assist decision-makers in better comprehending how to deal with the problem. Another potential future research direction is to look into how this information can be reprocessed so that it is immediately usable by the appropriate authorities7.

According to many research studies8,9,10,11,12, Twitter is an effective communication tool for gaining a better grasp of public concern and awareness regarding COVID-19. Because the pandemic is occurring in the digital age, COVID-19 must have an impact on social media trends, which are inextricably entwined with human lives. In the quest to reveal these effects effectively, we investigate the engagement across Twitter accounts as our primary data source to analyze the sentiments. Twitter users discuss health issues13, public health interventions14, and pandemic control15. Thus, Twitter analysis motivates us as health-related data analysis may assist governments in determining which public health messages are effective in combating the COVID-19 epidemic. Analyzing the data from this social media platform, valuable healthcare resources can be created to help construct a long-term strategy16. The decision-making process during the epidemic can be strengthened by analyzing social media data. Additionally, in a broader sense, this data can be used to modify social media marketing and opportunities for healthcare providers may be created by the user demands collected from those data. Additionally, we require an efficient analysis technique to extract the relevant information from social media data. To anticipate future social issues and combat them cautiously, in this study, we focus on analysis techniques over pandemic data efficiently and effectively.

Research objectives

This study aims to explore and quantify the dynamic changes in public sentiment expressed on social media in response to the COVID-19 pandemic, addressing the following key research questions:

  • RQ1. How have the post COVID-19 outbreak sentiments shared over social media shifted?

    After the outbreak of COVID-19, has there been a change in public opinion? If so, what specific sentiments increased or decreased here? What about the sentiments expressed over images?

  • RQ2. Is there a correlation between text and image sentiments?

    What if users express sentiment differently in text and image posts? This could lead to a bimodal study for sentiment analysis.

  • RQ3. To what degree have sentiments changed?

    Quantifying the changes is crucial if we want to measure the impact. Simply stating ‘rise’ and ‘fall’ is not sufficient if we are trying to perform an in-depth scientific study.

  • RQ4. How quickly did sentiments stabilize after the outbreak?

    What happened after the outbreak? Did the sentiments stabilize? What was the duration for sentiments to get back to normal? Was it the same for all sentiments and all modes (text and image)?

The previous research only attempts to answer part of our research questions individually. These studies depict the changes in sentiments but do not include any benchmark from the previous year - lacking vital context. Furthermore, individual sentiment analysis has been done on COVID-19, but correlating it with text sentiment is unique and novel to the best of our knowledge. We tried to capture these aspects of our studies through our research questions. To address these questions, we carefully selected 569 users who have posted Tweets about COVID-19 during the pandemic and collected all of their public Tweets and images shared during the years 2019 and 2020. We carried out several statistical analyses to provide a conclusive answer to the research questions at hand.

In the process of answering the research questions, we focus on three different research objectives, and our contributions according to the objectives are as follows.

  • Development of an Enhanced Sentiment Classification Technique: We introduce a novel sentiment classification technique for text-based Tweets, leveraging and improving existing sentiment classifiers. This advancement addresses the unique linguistic characteristics and styles of social media posts, offering a more accurate analysis of textual sentiments.

  • Comprehensive Longitudinal Bimodal Sentiment Study: Our study is pioneering in conducting a longitudinal bimodal analysis, examining sentiments through both text and images. By analyzing trends over two significant periods (pre-COVID and during COVID-19), we provide a thorough understanding of how sentiments have evolved across different modalities.

  • Identification of Correlation Patterns Between Text and Image Sentiments: We uncover and quantify the correlations between sentiment trends in textual and visual content. Our findings reveal potential similarity patterns in how sentiments are expressed over time in both text and images, offering valuable insights into the multimodal nature of social media interactions during the pandemic.

Our enhanced method builds on existing sentiment classifiers, improving their accuracy and adaptability to text-based Tweets, which is directly relevant to ongoing computational research in natural language processing (NLP). Additionally, our work contributes to the growing interest in longitudinal bimodal analysis within the community. By examining sentiments across two modalities (text and images) over two critical periods (pre-COVID and during COVID-19), our study pioneers a comprehensive approach to understanding how public sentiment evolves. This bimodal perspective integrates sentiment analysis with image processing, offering a multidimensional framework that aligns with the interdisciplinary interest of the computing and information science fields. Our contributions strongly resonate with the computing and information science community’s focus on advancing data-driven techniques and applying them to real-world challenges. Through this study, we demonstrate the critical role of computing technologies in addressing complex social phenomena, such as the emotional impacts of the COVID-19 pandemic, thereby enriching the community’s understanding of the societal applications of computational methods.

Related work

During an endemic, epidemic, or pandemic, researchers have explored how individuals express their characteristics through cyberspace. It is also common practice to examine the role of social media in shaping people’s lives during such outbreaks. The following three subsections will discuss studies that have investigated cyberspace, the influence of social media on disease outbreaks, and the typical methodologies employed by these researchers.

Cyberspace-based interactions during disease outbreak

Cyberspace has become a vital extension of everyday life in the digital age, with cyber activities deeply intertwined with modern communication practices17. Social media has shaped public opinion on various events, sparking research alongside the rise of new technology18,19,20. One study21 analyzed how the unrestricted nature of social media coverage heightened public anxiety and fear during the Ebola outbreak in Europe and the USA. Another study22 examines the evolution of social representations of COVID-19 on Chinese social media, revealing a shift from clinical to broader socio-economic and political discourses. Study23 assessed risk factors for probable anxiety and depression through an online survey on WeChat, a leading social media platform in China. Study24 also explored how individuals engaged with different travel-focused social media platforms in response to the Zika virus outbreak. A recent study25 analyzed 3.7 million help-seeking posts on Weibo during COVID-19, highlighting the platform’s crucial role in crisis response, as the study identified Granger causality between help-seeking posts and confirmed COVID-19 cases with an eight-day time lag. Another study26 developed a model using graph convolutional networks and ensemble learning to recognize multimodal negative sentiment in COVID-19-related data from Sina Weibo.

During the COVID-19 pandemic, a study27 revealed how public attention was significantly influenced by social media trends. Study28 explored how online social support influenced the public’s cognition, emotion, and belief in overcoming COVID-19, offering insights into mitigating the negative effects of quarantine. An observational study29 analyzed a large geo-tagged Twitter dataset (UsaGeoCov19) to examine public sentiment and work engagement during the COVID-19 pandemic across the United States. Another study30 examined public interest and emotional responses to COVID-19 by analyzing trending topics on the Sina Microblog hot search list. In addition, research31 focused on the sentiments and emotions expressed in news headlines. A comprehensive literature review32 delved into social media’s role in emerging infectious disease (EID) outbreaks, evaluating the methodological rigor of prior studies.

Further studies emerged examining the role of cyberspace in mental health, such as research33 that investigated how elderly women in Southeast Asia used online spaces to support their mental well-being during the pandemic. Another study34 explored the use of Twitter by essential workers to better understand the pandemic’s impact on their mental health. In the aforementioned studies, the focus was mostly on social media widely used on particular topics in those particular countries. In our study, we solely focus on overall Twitter data as the medium is a widespread communication tool, and its usage spikes during prominent events. Analyzing Tweets thus offers valuable insights into public opinion and trends related to the COVID-19 pandemic.

Impacts of social media during other disease outbreaks

Social media has been found to have significant potential in shaping crisis management decision-making through its sense-giving capabilities35. People expressing themselves in social media have been proven a syndromic surveillance tool for effective disease surveillance36. Studies37,38 have explored how social media can be leveraged as a real-time tracker to enhance pandemic surveillance systems. Additionally, research has examined the relationship between online message activity and real-world disease outbreaks, such as Influenza and Pertussis. One such study39 investigated the feasibility of using Twitter as a surveillance tool during outbreaks of these diseases.

Practitioners have built a research project to explore social media crisis communications during the Influenza outbreak and to help public health policymakers use the potential of social media platforms40. Another study41 provided insights from the social and behavioral sciences to outline human behavior with expert recommendations using social media. However, due to the unauthorized nature of social media, it has also become a breeding ground for misinformation, rumors, and conspiracy theories. During the COVID-19 pandemic, research42 has examined how this “infomedia” ecosystem fosters panic in cyberspace, with ongoing investigations into its impact. Study43 investigates the interplay between COVID-19 misinformation and fact-checks on Twitter, revealing that while fact-checks are rapidly produced, their impact on curbing misinformation, especially concerning conspiracy theories, remains limited.

Recent research44 examined commenting behaviors on Weibo within the COVID-19 Super-Topic, discussing the design implications of social media platforms to better support individuals in need during crises. Another study45 found that individuals often turn to multiple social media platforms to express distress during the pandemic, emphasizing the need to understand the broader social media ecosystem in which people engage in self-disclosure. In our study, we focus on analyzing Tweets on COVID-19 and its trends on different timelines.

Approaches to sentiment analysis

Since the outbreak of COVID-19 took place, significant scientific research has been carried out in this area. Most research focused on the sentiment of the COVID-19-related text Tweets. A few studies include topic modeling using Latent Dirichlet allocation (LDA)12,16,46 which is generally used to find a linear combination of features that characterizes or separates two or more classes of objects or events. The initial tweet datasets were often gathered using hashtags or COVID-19-related keywords. Most of the sentiment analyses are done using VADER12,47,48,49 or TextBlob9. Other studies utilized custom-built neural network models50, while convolutional neural networks (CNNs) were used in various efforts11,51,52 to detect sentiment text-based tweets. Beyond the pandemic, Twitter data has been widely used to assess sentiments on a range of topics, including public opinion on political issues53, infectious diseases54,55, and other aspects of everyday life56.

Location and topic-based studies

In a study10, the researchers conducted a country-wise assessment of sentiment ratios during the COVID-19 pandemic. Another investigation47 focused on determining public opposition to specific topics, such as mask-wearing, lockdowns, and vaccines. Sentiments, both positive and negative, related to COVID-19 vaccines were analyzed across different platforms in a separate study57. Another study58 investigated and compared public sentiments in the context of COVID-19 vaccines expressed on many popular social media platforms like Facebook, Twitter, Instagram, and TikTok. The most common form of analysis in these studies involved detecting positive, negative, and neutral sentiments. However, studies like50 expanded the analysis to emotions, identifying feelings such as anger, hope, and optimism in user posts. Study59 analyzes optimism and pessimism in Twitter conversations about COVID-19, revealing insights into emotional shifts and user influence during the pandemic. Another study60 detects suicidal ideation and emotional distress in Thai social media content, revealing a significant increase in such expressions during the COVID-19 pandemic. Using sentiment analysis and topic modeling, an interesting study61 has been carried out by researchers to examine the tourist concern regarding the COVID-19 pandemic. The focus and purposes of these studies varied widely.

Some researchers aimed solely at tracking sentiments on various topics. One study48 collected tweets related to hospitality and healthcare during COVID-19, while also analyzing tourism-related tweets from the same period. In another study62, researchers specifically measured the extent of anti-China sentiments expressed through social media comments during different phases of the pandemic. This study also provided empirical evidence on political and international relations issues involving China and Japan. A separate study63 linked social gaming with well-being through sentiment analysis, suggesting that social media and online games could serve as tools for coping with stress during the pandemic. Other research focused on finding the COVID-19 situation in specific countries, including Nepal11, India50, Canada64, China6, Bangladesh65, and Nigeria8,66.

Knowledge graphs and frameworks

An existing study67 focused on building a knowledge graph based on COVID-related Tweets. Another study47,68 focused on finding the authenticity of published news articles. References49,50,52 focused on building a comprehensive framework for finding all possible relevant posts in popular social media as well as news articles.

Image sentiment detection

Study47 grouped images by clustering them based on their corresponding texts. Other research68 attributed image sentiment to the sentiment of the related text. Some studies focused on analyzing associated images to extract relevant metadata, such as user information49. Another effort68 developed a multimodal search engine for analysis, where sentiments derived from metadata were applied to corresponding images according to user-preferred weighting. One study52 built a framework for image sentiment analysis using the popular Fast R-CNN model. Additionally, they have GIFs that are handled using an optical character recognizer which separates texts from images for defining the polarity. Research69 also suggests that both user-provided and machine-generated labels for geo-tagged Instagram images hold valuable information in predicting county-level health statistics.

Change-point detection and analysis duration

Several studies have also incorporated change-point detection as part of their analysis. This method was commonly used in studies that either constructed knowledge graphs from Tweets67 or attempted to correlate Tweets with concurrent events49.

The duration of analysis varied significantly across the literature. Some studies9,10 focused on March or April, immediately following the COVID-19 outbreak, to observe minor changes, though they did not provide any pre-pandemic context. Other studies16 focused on the period between December 2019 and March 2020, aligning with the official announcement of COVID-19. Additionally, some studies examined the post-pandemic period, either by analyzing new cases50 or by assessing public perspectives on topics such as tourism48 and veracity assessment47.

Summary

To summarize, existing research has primarily concentrated on analyzing sentiments from text-based social media posts and employing topic modeling with Latent Dirichlet Allocation (LDA)10,12,16,46,70. These studies commonly utilized out-of-the-box Python libraries such as VADER12,47,48,49 or TextBlob9,71 for text-based sentiment detection. Moreover, most of the studies relied on a single algorithm for sentiment detection, which often increases the chance of misclassification. Furthermore, research on image-based sentiment analysis is limited47,49,68, with techniques typically confined to specific hashtags47, geographic locations72,73, or narrow timelines49.

Thus, to the best of our knowledge, an extended longitudinal study on the changes in trends of sentiments due to COVID-19 is still unexplored in the literature. Existing research lacks data from the pre-COVID period for comparison with the during-COVID period. To assist psychologists and policymakers in understanding the psychological impacts of the pandemic, a longitudinal study that provides a comprehensive assessment of sentiment trends is essential. Additionally, exploring correlations between sentiments expressed through text and images over social media remains an unexplored area that could offer a more complete view of the situation.

Research gaps in the existing studies

We identified the following limitations of existing works related to our studies and mentioned how our focus differs from those studies.

  • Most papers focused on the year 2020 (and some till June 2021). None had the focus on 2019, which would give context to the change in sentiment that happened due to COVID-19.

  • Most papers only worked with text Tweets (and did topic modeling). A handful of papers worked with both text and image, but they strictly focused on a single topic (mask/restriction/politics). Our study can be a baseline for those who are looking for the big picture.

  • Most papers focused on the COVID-19 Tweets, ours were on the overall timeline of 569 users. This gives a unique picture of how the online social life of the COVID-19 interested users changed instead of only the COVID-19 sentiments.

  • Most of the works used VADER or TextBlob for sentiment analysis. Some papers were entirely dedicated to developing a state-of-the-art sentiment analyzer, but they did not conduct the full-scale analysis we did. We proposed an alternative heuristic for better analysis of text sentiment classifiers that combines both rule-based and neural network-based approaches.

These discovered shortcomings, particularly the lack of context, are critical for policymakers and health workers when making administrative choices. It would be helpful for them to know whether the increase in negative sentiment is attributable to COVID-19 or simply a seasonal fluctuation. The analysis has to include image sentiment on top of context as is it possible that people may express emotion differently in images compared to text. There has to be a robust text sentiment classification technique as most of the social media content is still text-based. Most of the sentiment libraries are not fit to classify social media Tweets. Finally, a complete framework has to be laid out to carry out this bimodal sentiment analysis to aid researchers with similar interests.

Dataset creation

The subjects of our investigation in our dataset74 are only those who have shown interest during the COVID-19 pandemic. For selecting these users, we collected Tweets based on COVID-related hashtags that are commonly referred to in other research literature75,76,77. The following 21 hashtag combinations were used for probing Tweets. Our goal was to select users who are actively focused on the infection rate, advancements in vaccine development, and the restrictions implemented due to COVID-19. Therefore, we have included hashtags that are closely related to COVID-19 such as lockdown, worldometer, social distance, etc.

coronavirus; COVID-19; dashboard; comorbidity; lockdown; quarantine; vaccine; worldometer; who, corona; who, COVID; corona, dashboard; COVID, dashboard; data visualization, corona; data visualization, COVID; data visualization, dashboard, corona; data visualization, dashboard, COVID; hopkin, corona; hopkin, COVID;coronavirus; COVID-19; dashboard; comorbidity; lockdown; quarantine; vaccine; worldometer; who, corona; who, COVID; corona, dashboard; COVID, dashboard; data visualization, corona; data visual- ization, COVID; data visualization, dashboard, corona; data visualization, dashboard, COVID; hopkin, corona; hopkin, COVID; social, distance; worldometer, corona; worldometer, COVID First Author et al.: Preprint submitted to Elsevier Page 7 of 29 social, distance;coronavirus; COVID-19; coronavirus; COVID-19; dashboard; comorbidity; lockdown; quarantine; vaccine; worldometer; who, corona; who, COVID; corona, dashboard; COVID, dashboard; data visualization, corona; data visual- ization, COVID; data visualization, dashboard, corona; data visualization, dashboard, COVID; hopkin, corona; hopkin, COVID; social, distance; worldometer, corona; worldometer, COVID First Author et al.: Preprint submitted to Elsevier Page 7 of 29dashboard; comorbidity; lockdown; quarantine; vaccine; worldometer; who, corona; who, COVID; corona, dashboard; COVID, dashboard; data visualization, corona; data visual- ization, COVID; data visualization, dashboard, corona; data visualization, dashboard, COVID; hopkin, corona; hopkin, COVID; social, distance; worldometer, corona; worldometer, COVID First Author et al.: Preprint submitted to Elsevier Page 7 of 29 worldometer, corona; worldometer, COVID

For collecting the related Tweets, an advanced tweet-scraping Python library named Twint was used. In total, we collected 236,782 Tweets posted from 1 January 2020 till 2 September 2020, 1:11 PM (GMT). We considered both normal tweets as well as tweets based on GIF/Video thumbnails in our scraping.

Fig. 1
figure 1

Daily activity of users based on COVID-19 hashtags.

Fig. 2
figure 2

Step-by-step filtration for dataset preparation.

From the collected Tweets, the number of unique users we found was 149,210. These are the initial set of users who showed interest during the COVID-19 outbreak and shared Tweets using the hashtags (Fig. 1). The collected tweets contain information from the pre-COVID year, and as the personal factors influencing individual sentiment may introduce bias, only those who were concerned with COVID-19 were chosen as our target users. It is computationally challenging to collect the full timeline of all of these users. Therefore it was needed to find significant users. To identify the most active users, we initially ranked them by the number of Tweets they posted. However, upon closer examination, we found that the top users on the list were spammers who repeatedly posted the same tweet in large quantities. This indicates that many of these high-volume users were automated bots posting at an extremely high rate. Despite their large output, their activity was concentrated within just a few days.

To filter these users, we decided to sort them based on the number of individual days they posted something on Twitter (Fig. 2). It was found that the \(1000^{th}\) user’s Tweets spread over only five days. We arbitrarily selected the top 720 users from the list. \(720^{th}\) user’s Tweets span over only six days. Including more users from the list would not have a further significant contribution to our longitudinal study. We collected all of the public Tweets of these selected users from 1 January 2019 to 24 December 2020. Over 724 days spanning this period, the selected users posted a total of 7,409,429 Tweets. Their posts also contained 1,233,743 media links that included images and video thumbnails. During the collection of these images, we found that 1,198,715 links were valid and the rest were either deleted, made private, or the corresponding user became inactive altogether.

As we aimed to track changes in user sentiment during 2020 (the COVID-19 period), it was essential that users active in 2020 had also tweeted in 2019. Therefore, we further refined the list of 720 users, selecting only those who tweeted in both 2019 and 2020. This resulted in 569 users, who collectively posted 5,353,462 Tweets and 1,029,444 images. The number would decrease even further if the period were extended.

Methodology

To understand the pattern of sentiment change over cyberspace, we initially collected public thoughts and opinions regarding COVID-19 from social media. We selected Twitter, a popular micro-blogging platform used worldwide for posting ideas and sharing daily life experiences. The justification for choosing Twitter as a sentiment analysis data source is discussed in section “Introduction”, and the general architecture of the study’s structure is shown in Fig. 3.

Fig. 3
figure 3

The general architecture of how we design the structure of this study.

Sentiment detection

After collecting the text and image Tweets, we assigned one of three sentiments (negative, neutral, or positive) to each Tweet. Only English Tweets were considered for text sentiment classification. For text sentiment classification, we developed a heuristic that combines the outputs of both classical rule-based models and modern transformer-based classifiers. For image sentiment classification, we utilized a cross-media learning model initially released in 2017. The detailed process of classification is discussed in the following sections.

Text sentiment classification

Text is the most popular form of idea-sharing on social media. Therefore, accurately classifying the sentiment of shared text is essential. However, traditional sentiment detection libraries often struggle to capture the sentiment of shared Tweets due to the unique styles of expression on social media. To address this, we selected five sentiment classifiers (Table 1) for detecting text sentiment.

Table 1 Baseline text sentiment detection libraries.

All text Tweets were cleaned before sentiment classification using the tweet-preprocessor library available in Python. Irrelevant text elements, such as URLs, emojis, hashtags, mentions, Twitter reserved words (e.g., RT or FAV), smileys, and numbers, were removed during the cleaning process.

Most recent Twitter sentiment-related research studies9,49 have relied on VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based sentiment analysis tool, or TextBlob, a Python library for processing textual data, for text sentiment classification. However, we observed that these classifiers do not always perform well with Tweets related to text sentiments. To investigate further, we selected five popular sentiment detection libraries for text sentiment classification. Our goal was to include a diverse set of classifiers to explore how they perform in comparison to human perception, particularly for COVID-19-related Tweets. Therefore, we considered rule-based classifiers (TextBlob, VADER, and Afinn) as well as transformer-based classifiers (HuggingFace and TwitterSentiment). Among these, VADER and Afinn are specialized in social media sentiment analysis, while TwitterSentiment is a fine-tuned Twitter sentiment classifier developed in the PyTorch framework. Each of these libraries has its own thresholds for classifying negative, neutral, and positive sentiments, and their working modalities substantially differ from one another.

VADER is a lexicon and rule-based sentiment analysis tool specifically designed for classifying sentiments in social media. It uses a sentiment lexicon, which is a collection of lexical features (e.g., words), each classified as positive or negative based on their semantic orientation. VADER not only provides the positivity and negativity scores but also the degree to which a sentiment is favorable or negative. The compound score is calculated by summing the valence ratings of each word in the lexicon, adjusting them according to specific rules, and then normalizing the result to a range of -1 (extremely negative) to +1 (extremely positive). This classifier is effective for identifying statements as positive (compound score \(\ge 0.05\)), neutral (compound score between -0.05 and 0.05), or negative (compound score \(\le -0.05\)) based on uniform thresholds. In our study, the neutral range for VADER was set as \(-0.05< \text {compound score} < 0.75\).

AFINN is another tool that operates based on a list of words with valence ratings ranging from -5 (negative) to +5 (positive). The AFINN lexicon assigns negative values to words with negative sentiment and positive values to words with positive sentiment. In this study, for the AFINN classifier, we set the neutral range as \(-0.1< \text {sentiment score} < 0.0\).

Additionally, TwitterSentiment is a sentiment analysis software written in Python using the PyTorch framework. The goal of this classifier is to create a sentiment analyzer specifically tailored to the Twitter domain. For TwitterSentiment, in our study, we classified the input as neutral sentiment if the sentiment score is 0.

The Transformers library (previously known as pytorch-transformers and pytorch-pretrained-BERT) provides tens of thousands of pre-trained models for performing tasks across text, vision, and audio. These models, available in over 100 languages, can be used for tasks such as text classification, information extraction, question answering, summarization, translation, and text synthesis. In our study of text classification with this classifier, we set the neutral range as \(-0.7< sentiment ~score < 0.7\).

TextBlob is a text processing package for Python 2 and 3. It provides a basic API for performing standard natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. The sentiment property of this classifier returns a named tuple of the form Sentiment (polarity, subjectivity). Here, the polarity score is a floating-point number between -1.0 and 1.0, where -1.0 indicates a negative sentiment and 1.0 indicates a positive sentiment. Subjectivity is also a floating-point value between 0.0 and 1.0, with 0.0 being very objective and 1.0 being very subjective. In our study, we set the neutral range for sentiment classification as \(-0.05< sentiment ~polarity < 0.05\).

We selected the first 500 text tweets from the initially collected dataset as a sample. Five human observers classified the sentiments of the tweets as either negative, positive, or neutral. The five raters were from computer science backgrounds, aged between 20 and 28 years. All raters were male, proficient in English, and had sufficient experience using social media. Fleiss Kappa is a popular coefficient for measuring inter-rater agreement78,79 in the case of three or more raters. The Fleiss Kappa value for inter-rater agreement was 0.397, which suggests that the five raters had fair agreement80. This level of agreement is understandable because tweets during the COVID-19 period are different from those during a typical non-pandemic period, and it is not uncommon for tweets from the COVID-19 period to be categorized differently by different raters.

The classification results from the raters were aggregated using the majority voting approach. These aggregated results were then used to determine the baseline accuracy for our text-based sentiment classification. To achieve this, the same 500 tweets were classified using the five libraries mentioned earlier. We then calculated the accuracy and F-1 score for each library, comparing these results to the baseline classification achieved through majority voting across the five raters. The results obtained are shown in section “Sentiment library evaluation and heuristic development”.

Algorithm 1
figure a

Majority voting on all five classifiers (Heuristic-1)

To combine the outputs of the libraries, we tested several combinations, and three of these heuristics are noteworthy (see Table 3, H-1 to H-3). The confusion matrix of H-1 revealed that the libraries tended to have a bias toward marking the tweets as negative. Additionally, H-1 proved less accurate due to the low performance of TwitterSentiment and TextBlob. As a result, we chose three other libraries that showed less bias, and developed H-2 and H-3 based on their outputs.

Upon closer inspection, we found that HuggingfaceTransformer was effective at detecting neutral sentiment while maintaining balance between the other two sentiments. H-2 performed majority voting on the three best libraries (VADER, Affin, and Transformer), and in case of a tie, checked if the Transformer model classified the sentiment as neutral. If it did not, the outcome of VADER would be considered the final decision. H-3 followed the same approach, except that Afinn was used in place of VADER for the final deciding step. This approach improved the accuracy to 68%, outperforming all other results. The pseudo-code for the three heuristics is described in Algorithm 1, Algorithm 2, and Algorithm 3.

Algorithm 2
figure b

Majority voting on L-1, L-2, and L-4. In case of a tie, decide neutral based on only L-4; otherwise, decide based on only L-1 (Heuristic-2)

Algorithm 3
figure c

Majority voting on L-1, L-2, and L-4. In case of a tie, decide neutral based on only L-4; otherwise, decide based on only L-2 (Heuristic-3)

Image sentiment classification

Image sentiment classification was not widely explored in the context of COVID-19 analysis, and most of the attempts we reviewed employed indirect approaches for sentiment detection, as discussed in section “Approaches to sentiment analysis”. We performed sentiment detection using a Cross-Media Learning model81 specifically designed for Twitter image sentiment analysis. For this, we utilized a pre-trained CNN from the T4SA dataset, which specializes in detecting image sentiments collected from Twitter. The original T4SA dataset was created by gathering 3 million tweets containing both text and images. The sentiment of the images was initially labeled based on the associated text, and the authors applied extensive filtering and cross-validation to improve the accuracy of the labels.

The implementation offered several pre-trained models, and we chose to use VGG-19, which achieved an accuracy of 88.1% on the test T4SA dataset. The model provided a vector for each image containing confidence scores for the three sentiment categories. Majority voting was then applied to determine the final sentiment of each image. We selected this VGG-19 fine-tuned model and used the prediction script from the T4SA GitHub repository to make predictions on the images.

User sentiment group migration

Before diving into granular trend analysis, we aimed to generate a high-level overview of how the nature of sentiment changed for each person. Specifically, we wanted to determine whether users who shared tweets with a dominant sentiment in a given year exhibited the same behavior in the following year. To achieve this, we classified users for a particular year (2019 or 2020) based on the following criteria.

  • Negative Users: Users who have posted mostly negative Tweets (text or image) throughout the year.

  • Neutral Users: Users who have posted mostly neutral Tweets (text or image) throughout the year.

  • Positive Users: Users who have posted mostly positive Tweets (text or image) throughout the year.

Here, “mostly” refers to being sorted based on the percentage of sentiment. If the percentage of negative sentiment is the highest, the user is considered “mostly” negative for that year. This categorization is also applied to positive and neutral tweets. To determine how many people changed their overall sentiment during COVID-19, we divided the tweet and image sentiments into yearly groups (2019 and 2020). If a user’s tweets were mostly negative in 2019, they were labeled as a “Negative User” for that year. We categorized all individual users as either “Negative User,” “Neutral User,” or “Positive User” for both years and observed the changes in the counts.

Plotting and change-point detection

After assigning sentiment to each of the text tweets and images, we divided them (referred to as data points) into six individual datasets (see Table 2) based on category (text or image) and the three sentiment types (negative, neutral, or positive). We further aggregated the datasets by counting the number of data points in daily groups, resulting in six time series, each with 724 data points. All sentiment counts within a category (text or image) were then normalized.

Table 2 Six-time series generated from the posts of 569 users spanning over 724 days.

COVID-19 gained public attention around December 2019 and was declared a global pandemic on March 11, 202012. However, the study in68 suggests that there is typically a delay between the occurrence of an event and its response on social media. To accurately determine when the change occurred for each time series, we applied three different change-point detection algorithms: CUSUM (Cumulative Sum)82, Pelt (Pruned Exact Linear Time) Search83, and DP (Dynamic Programming) Search84. Additionally, we decomposed each series into trends and compared them against the rolling window mean and standard deviation for visual inspection.

The CUSUM (or cumulative sum control chart) is a sequential analysis approach85 developed by E. S. Page of the University of Cambridge for statistical quality control. It’s commonly used for change detection monitoring. As its name implies, CUSUM involves the calculation of a cumulative sum (which is what makes it “sequential”). Samples from a process \(x_{n}\) are assigned weights \(\omega _{n}\), and summed as follows.

$$\begin{aligned} S_0&= 0 \end{aligned}$$
(1)
$$\begin{aligned} S_{n+1}&= \max (0,S_{n}+x_{n}-\omega _{n}) \end{aligned}$$
(2)

The \(\omega\) usually represents a likelihood function. Change-point is considered to be found if the value of S crosses a threshold value.

We used Facebook’s (now Meta) Kats Library to implement CUSUM. The base detector employs a Gaussian distribution model to measure the cumulative sum and assumes a single change-point. The wrapper library runs the base detector multiple times to identify multiple change-points86. The Pruned Exact Linear Time (PELT) search is widely used in research87,88 for change-point detection. It is an exact method that detects change-points by assigning a penalty to changes in a time series. For detecting change-points using PELT search, we used the Ruptures library, a Python module that provides offline change-point detection algorithms. The Dynamic Programming (DP) search identifies the optimal partition where the sum of errors is minimized given a segment model. We applied this search method using the Ruptures library.

To facilitate visual inspection, we also decomposed each time series to reveal the underlying trend. Decomposing a time series involves breaking it down into its level, trend, seasonality, and noise components. Time series decomposition is a useful abstract paradigm for understanding temporal data. The trend component represents the long-term evolution of the series. When the data exhibits a consistent increase or decrease, it is considered a trend, and the trend component need not be linear. Additionally, we calculated the mean and standard deviation over a 30-day rolling window for each time series. A sharp spike in the standard deviation graph would indicate changes in the trend, while the plotted average would more clearly illustrate the frequency of these changes.

Effect size of sentiment change

Once the change points were identified, we calculated the change in mean and the effect size (measured by Cohen’s d) for the pre-change-point and post-change-point sentiments. The change in mean provides a generalized idea of the increase or decrease in sentiment. Effect size (Cohen’s d), on the other hand, offers a contextualized measure by accounting for the standard deviation between the two series. It is an appropriate measure for comparing two series where the overall mean differs. Cohen’s d can be calculated using the following formula.

$$\begin{aligned} {\text {Cohen's D}} = \frac{Mean \; Difference}{Standard \; Deviation} \end{aligned}$$
(3)

The interpretation of different values for Cohen’s d was taken from89.

Correlation between text and image sentiment

We correlated the text time series (TS-1, TS-2, TS-3) with the image time series (TS-4, TS-5, TS-6) using both parametric (Pearson’s correlation coefficient90) and non-parametric (Kendall’s correlation coefficient and Spearman’s correlation coefficient91) methods. These methods require comparing two series. To track changes over time, we divided the time series into eight quadrants (Q1-2019, Q2-2019, Q3-2019, Q4-2019, Q1-2020, Q2-2020, Q3-2020, Q4-2020), with each quadrant consisting of three consecutive months. The r coefficients were calculated for each quadrant, representing the similarity of sentiments between text and image, and plotted with a trend line to show how the correlation changed over time. The interpretation of Pearson’s r was based on92.

Settling trend identification

We aimed to calculate the time it took for each time series to return to its original form after the COVID-19 outbreak, which would indicate how long it took for people to recover from the pandemic. To quantify the changes in each sentiment time series, we selected the COVID-affected segment and compared it with the non-COVID-affected segment from the previous year up to its change point. Each non-COVID segment was compared with the corresponding time series from the change point to December 15, 2020. The calculated distances for the six time series (TS-1, TS-2, etc.) are 313, 233, 283, 293, 263, and 425, excluding the first data point. All segments were smoothed using the LOWESS smoothing algorithm to capture the trend.

The LOWESS filter is a commonly used smoothing method93,94 that applies a locally weighted regression function. It draws a smooth line through a time plot or scatter plot, making it easier to identify relationships between variables and predict trends. This method uses a weighting function, which reduces the influence of adjacent values on the smoothed value at a given point as the distance between them increases.

We calculated the Dynamic Time Warping (DTW) distance for each pair of segments across all time series to assess the extent of change caused by COVID-19 in each series. DTW is a time series analysis algorithm95,96 that measures the similarity between two temporal sequences that may vary in speed. It has been applied to temporal sequences of video, audio, and graphical data. In fact, DTW can analyze any data that can be converted into a linear sequence.

Series reconstruction

To visualize how the series would have appeared without the impact of COVID-19, we trained two popular forecasting models using data from one month before the change point of each time series. This approach allows us to observe the effect of COVID-19 on the regular sentiment trends. The models used in our analysis are briefly described below, with each having its own training method and parameters.

Prophet

Prophet is a library developed by Facebook for forecasting time series data97,98. It uses an additive model that handles non-linear trends by incorporating periodic seasonality. The model’s parameters were cross-validated to minimize deviation, and yearly, weekly, and daily seasonality components were enabled during training.

Orbit

Orbit is a Python package developed by Uber for Bayesian time series modeling and inference. To validate the results from Prophet, we employed the Kernel-based Time-varying Regression model. For estimation, we used pyro-svi and incorporated weekly seasonality, with 2021 as the randomization seed. As recommended by the Orbit documentation, the bootstrap value (N) was set to \(10^4\).

Effect of smoothing filter for forecasting

A smoothing filter often increases forecast accuracy. Therefore, we applied ten different smoothing filters (Exponential Smoothing, Convolutional Smoothing, Spectral Smoothing with Fourier Transform, Polynomial Smoothing, Spline Smoothing, Gaussian Smoothing, Binner Smoothing, LOWESS, Seasonal Decompose Smoothing, and Kalman Smoothing) on the COVID-19 unaffected stable part of each time series (the first 300 days of 2019). The results are discussed in section “Series reconstruction”.

Findings

It was a significant challenge for us to identify genuine users from Twitter bots to filter out the human Tweets related to COVID-19. After shortlisting 569 users, we plotted their tweet sentiment trends over the years 2019 and 2020 for both image and text data separately. Our first major finding in these trend lines was the different change-points detected in the time series. Secondly, we measured the amount of deviation that took place after the change-point for each series. Finally, we measured the correlation coefficient r between text and image sentiment over the entire period.

Sentiment library evaluation and heuristic development

Five human observers were tasked with classifying the sentiments of 500 randomly selected Tweets as either negative, positive, or neutral. The final classification for each Tweet was determined based on the majority vote, and this result was considered the benchmark for assessing how accurately the text expressed its sentiment. The same 500 Tweets were then analyzed for sentiment using the five libraries previously discussed, and the accuracy and F-1 score for each library were computed in comparison to the baseline accuracy. The results of this comparison are displayed in Table 3 (L-1 to L-5).

Table 3 Evaluation (Precision, Recall, and F-1 score) of five popular libraries and a few heuristics based on human perspective. The best and second-best results are highlighted in bold and italics.

The accuracy reported for the classifiers/libraries in their source documentation differs from what we observed in our study model. The original model was fine-tuned with 150k English reviews along with reviews from five other languages. While pre-training the HuggingFace Transformer with our subjective data could further increase accuracy, using 500 labeled data is generally sufficient for training such a large dataset. However, accurately labeling a significant portion of the collected dataset would require considerable time and effort from individual raters. Additionally, Tweets during the COVID-19 period differ from those in regular times, as highlighted by studies like99 and12. This discrepancy in content led to lower accuracy in sentiment classification for COVID-19-related Tweets compared to the accuracy reported in the formal documentation of the classifiers/libraries.

Fig. 4
figure 4

Migration of users between various sentiments in 2020.

User sentiment group migration

Figure 4 summarizes the results of our user migration analysis. By marking each user with a sentiment based on the methodology outlined in section “User sentiment group migration”, we tracked their sentiment shifts in 2020. Our findings reveal a significant migration towards neutral sentiment for text-based Tweets. Specifically, 23% of users who were initially negative and 37% of users who were positive shifted to neutral sentiment. Overall, there was a 14% increase in neutral sentiment, a 38% decrease in positive sentiment, and a 6% increase in negative sentiment.

Table 4 Change-points detected by the three algorithms and the final change-point considered.

In contrast, the migration trends for image posts present a different pattern. A significant portion of users shifted towards negative sentiment, with 19% of users migrating from neutral and 45% from positive to the negative sentiment zone. Overall, there was a 29% increase in negative sentiment, an 8% increase in neutral sentiment, and a substantial 64% decrease in positive sentiment.

Change-point identification

The detected change-points are summarized in Table 4. The CUSUM algorithm was unable to detect any change-point for TS-1 and TS-6. However, it closely aligns with the change-points identified by other methods for TS-3, TS-4, and TS-5. In the case of TS-1, since both DP and PELT methods agree on the change-point date, this date is considered the change-point for this time series. Similarly, for TS-4 and TS-5, the detected change-points from both methods coincide, confirming their accuracy. For TS-2 and TS-3, since the methods differ slightly, the average of the detected change-points is used as the final change-point for these time series.

Table 5 Effect size in sentiment trend after the change-points and the interpretation.

The six sentiment trends, along with their corresponding rolling standard deviation, rolling mean, and trend, are presented in Fig. 5. Each graph highlights the change-points with a red vertical line to indicate the moments where significant shifts in sentiment occurred. These visualizations offer an intuitive understanding of how sentiment evolved, with the change-points marking crucial transitions in the trends.

Text sentiments

For the positive text sentiments, the change-point occurred around the middle of February. As shown in the graph, there was a steep decline in the trend at this change-point, coinciding with a sharp drop in the mean after it reached a peak in late February.

For neutral text sentiments, the change-point was calculated to be in mid-May. During this period, both the trend and the mean exhibited a steady decline, while the standard deviation displayed natural fluctuations.

In the case of negative text sentiments, the change-point was identified towards the end of March. At this point, the standard deviation began to rise from a peak low, while both the trend and mean continued to show an upward trajectory.

Fig. 5
figure 5

Change-point (marked using red vertical line) for Text and Image Sentiments.

Image sentiments

In analyzing the positive image sentiments, the change-point was identified in early March, marked by a rise in standard deviation from a peak low. The trend and mean for this sentiment continued to decline steadily after the change-point.

For neutral image sentiments, the change-point occurred in early April, where both the mean and trend showed a consistent upward trajectory. At the same time, the standard deviation began to rise from its peak low.

Lastly, the change-point for negative image sentiments was found at the end of October or early November. At this point, the mean showed a steady upward trend, and the time series also displayed an upward trend with continuous fluctuations. The standard deviation fluctuated naturally around this change-point.

Effect size of sentiment change

The effect size for sentiments, as shown in Table 5, is calculated using Cohen’s d, providing a normalized measure of the effect relative to the change in mean. This offers valuable insights. For instance, it is observed that the decrease in neutral text sentiment is more significant than the decrease in positive sentiment, even though the mean change is nearly identical. It also appears that the negative text sentiment reflects the accumulated decrease from both neutral and positive text sentiments.

Regarding image sentiment, it is evident that positive sentiment experiences a substantial loss. This loss is attributed to both neutral and negative image sentiments, with neutral sentiment accounting for the larger portion of the decline.

Settling trend identification

Figure 6 illustrates how the sentiments “settled” back to their normal form after the COVID-19 outbreak and change-point. The decline in DTW for positive text sentiment suggests that the trend quickly returned to normal after June 2020. The DTW distance for both positive and neutral image sentiment remained relatively high, while the distances for neutral text, negative text, and negative image were average. Among these trends, the positive image sentiment showed a steadily increasing distance throughout the COVID-19 period.

Fig. 6
figure 6

Settling trend after the change-point or six-time series.

Correlation between text and image sentiment

To perform the bimodal analysis, we divided the two-year timeline into eight quarters. Each data point on the X-axis of Fig. 7 represents a quarter of a year, with the corresponding Y value indicating the correlation between text and image sentiment for that quarter’s starting month, along with the next two months.

Fig. 7
figure 7

Three correlation coefficients (left: Pearson’s relation, middle: Spearman’s relation, and right: Kendall’s relation) denote similarity patterns between text and image sentiments over eight quadrants (2019 to 2020). Each color represents the corresponding sentiment (green for positive, gray for neutral, and red for negative).

Looking at Pearson’s r, we observe an increase in inverse correlation between text and image sentiments as the pandemic progresses. The only exception occurs during the peak of COVID-19, where we see a strong positive correlation between positive text and positive image sentiments. The overall shape of the graph remains similar to those for Spearman’s and Kendall’s r, though Kendall’s graph has a shorter overall amplitude. Pearson’s r and Spearman’s r are almost identical, with the only noticeable difference being that the trend line for positive sentiment is slightly steeper than that for neutral sentiment in Pearson’s r.

Series reconstruction

We have reconstructed the six-time series based on the steps mentioned in section “Series reconstruction”. The reconstructed series are shown in Fig. 8. The observations are described below.

Fig. 8
figure 8

Series Reconstruction using Prophet and Orbit library.

Text sentiments

The construction graph for positive text sentiment reveals that both forecasting tools predicted a steady increase post-change-point in late February, with Orbit projecting a stronger upward shift than Prophet. However, the actual trend showed a sharp decline from late January to early March before returning to pre-COVID levels from March to July, then stabilizing. Notably, actual positive sentiment remained significantly lower than both forecasts.

For neutral text sentiment, Prophet and Orbit diverged considerably after the change-point. The actual trend mirrored Prophet’s fluctuation pattern, decreasing from 55% in April to 50% in July before rising back to 55% in December. However, Prophet overestimated neutral sentiment, never dropping below 52%. Orbit predicted a steady 50%, but the actual trend fluctuated between 50% and 56%, contradicting Orbit’s steady trend.

Regarding negative text sentiment, Orbit forecasted a stable 24%, while Prophet predicted a fluctuating rise, peaking at 27% in early August and mid-September. The actual trend, however, showed a sharp increase from 24% in mid-February, peaking at 31% in late May before declining to 25% by year-end.

Image sentiments

The analysis of positive image sentiment reveals a sharp decline from about 19% at the change-point in early February to a low of 12% in July before recovering to around 16% in mid-December 2020. However, a further decline to 11% occurred by year-end. This contrasts with the forecasting models: Orbit predicted a steady 21% trend, while Prophet projected a sharp rise to 25% by mid-April, maintaining that level thereafter.

For neutral image sentiment, Prophet forecasted fluctuations averaging 40% from March to October, decreasing to 37% by year-end. Orbit predicted a stable trend at about 37.5%. In contrast, the actual data showed an increase from 39% in early March, peaking at 48% in Q2 before slightly dropping to 44% by year-end.

Regarding negative image sentiment, Orbit predicted a steady, slightly increasing trend averaging 33% from the change-point in October 2019 through 2020. Prophet forecasted a fluctuating trend between 32% and 42.5%. The actual trend aligned more with Prophet’s fluctuating pattern but remained consistently higher, ranging from 35% to 48% throughout the period.

Table 6 Effect of various smoothing algorithms on series forecasting.
Table 7 Interpretation of Pearson’s r correlation between text and image sentiments over the quarters.

Effect of smoothing filter

The results indicate that smoothing has a negligible impact on improving prediction accuracy. Only three of the ten smoothing filters enhanced accuracy for three of the six time series (see Table 6). Therefore, we did not apply an external smoothing filter for prediction. However, to project sentiment trends, the final output series were all filtered using the LOWESS smoothing algorithm (see section “Settling trend identification”).

Discussion

In the previous sections, we discussed how the quantification of sentiment change was done for both text and image. The answers to our research questions can also help researchers better understand how a pandemic shapes mass social changes13,14,15,100, and the results can be used as a benchmark for policymakers and social scientists to assess other global events and future pandemics. In this chapter, we will attempt to answer the research questions (RQs) posed in section “Research objectives”. At the same time, we will compare our results with other similar works and correlate any similar events.

Quantification of sentiment change

To quantify sentiment changes, we first identified the changepoints for each of the six time series. Sentiment shifts were then measured for both text and images before and after these changepoints. This provides the answer to our RQ-1.

Text sentiment

In our study, the shift in text sentiments primarily involved neutral sentiments turning negative. The change-point for positive text sentiment occurred earlier (Fig. 5a), followed closely by a rise in negative text sentiment (Fig. 5c). In contrast, the increase in neutral sentiment emerged much later, approximately 1.5 months afterward (Fig. 5b). This suggests that the initial surge in negative sentiment stemmed from a decline in positive text, while the later increase (after May) was driven by neutral text shifting to negative sentiment.

Image sentiment

The overall effect size noted in Table 5 indicates that sentiment changes were not uniform across text and images, beginning to address our RQ-2. The trend in image sentiments follows a different pattern. As shown in Fig. 5(f), the change-point for negative image sentiment occurred much earlier, in November 2019. Peaks in the standard deviation graph highlight sentiment shifts throughout 2020, with noticeable peaks in January, April, and November. However, these peaks are not significantly different from those in early 2019, suggesting a relatively low direct impact of COVID-19 on negative image sentiment. Instead, the consistently low positive image sentiment (Fig. 5d) appears to be a result of the increased prevalence of neutral image sentiments (Fig. 5e) in 2020.

Effect size of change for sentiments

The effect size calculated in Table 5 addresses our RQ-3. The interpretations in the fifth column indicate that during COVID-19, people shared more negative text (a 10.55% increase) while neutral text decreased (a 4.26% decline). Additionally, they posted more neutral images (an 8.29% increase) while positive images saw a significant drop (a 24.52% decrease) on average.

Correlation in text and images

The longitudinal analysis of image sentiments is another novel aspect of our work. Understanding the correlation trend between text and image sentiments can serve as a benchmark for detecting subtle events akin to a pandemic outbreak. Given that Spearman’s r is more sensitive to errors101, the shorter amplitude of Kendall’s relation in Fig. 7 suggests a weaker overall non-linear correlation between text and image sentiments. Therefore, the parametric correlation (Pearson’s r) warrants the most attention in our analysis.

The quarterly correlation trend for Pearson’s r is detailed in Table 7, which also addresses the second part of RQ-2 (how are image and text sentiments correlated?). Notably, we observe a significant shift in correlation patterns in the post-COVID digital landscape. A weaker inverse correlation followed by a strong positive correlation between text and image sentiments could serve as an indicator of a crisis outbreak. This is evident in Fig. 5(a) and (d), where both text and image positive sentiment trends decline from January to March. Additionally, the overall inverse correlation between text and image sentiments in the neutral category may characterize post-pandemic cyberspace. To the best of our knowledge, this study is the first to explore the correlation between text and image sentiments in this context.

Settling trend and graph reconstruction

From Fig. 6, we observe an increase in deviation across almost all trends, except for positive text sentiment. This graph addresses our final research question, RQ-4. Additionally, there is significant disagreement between the reconstructed graphs generated by Prophet and Orbit for all sentiment categories (see Fig. 8).

Threats to validity

Since numerous global events took place during 2019 and 2020, it is possible that the observed sentiment fluctuations were not solely driven by COVID-19. To address this concern, we carefully selected users who were exclusively focused on COVID-19 through manual inspection before collecting Tweets for analysis. In our study, we initially shortlisted 720 users actively posting COVID-19-related Tweets. However, many of them were inactive in 2019, reducing the user count to 569, with the number decreasing further for later years. To ensure a sufficient number of users were active in both pre- and post-COVID-19 periods, we limited our analysis to 2019-2020. This approach helps control for extraneous variables, minimizing alternative explanations for the sentiment trends observed in our study. It is also worth noting that other recent trend analysis studies typically examine approximately one year48,67. In contrast, our study covers two years (or 724 days), providing a broader temporal scope. Nevertheless, we acknowledge that extending the timeframe further could yield additional insights.

Additionally, to ensure a comprehensive analysis, we considered both text and image content shared on social media. To mitigate selection bias, we sourced COVID-19-related hashtags from established research studies when collecting Tweets. Expanding the scope to include content from other platforms or different hashtag selections could potentially lead to new findings.

Furthermore, we utilized the full sentiment classification range (-1.0 to +1.0) and adhered to the guidelines specified in the documentation of the corresponding sentiment classifier libraries to distinguish neutral sentiment. To prevent bias, we manually inspected the Tweets and did not filter those from any particular user. Consequently, using an alternative definition of neutral sentiment or implementing automated filtering methods could result in different outcomes.

Implications of our study

In our study, we initially identified gaps in the existing literature (section “Research objectives”). These gaps highlight the lack of context that is essential for policymakers and health workers when making administrative decisions. It is important to note that sentiment analysis of social media interactions has already been a focus for policymakers102 and health workers103. By incorporating trend analysis, we aim to assist them in critically evaluating the path to effective decision-making. For example, policymakers and health workers must determine whether an occasional surge or dip in sentiment is due to COVID-19 or merely a seasonal fluctuation. To answer this question, sentiment classification must encompass both text and images, as individuals may express emotions differently through images compared to text. To the best of our knowledge, our study is the first to contribute to this context.

Additionally, a robust text sentiment classification technique is needed, as most social media content is still text-based. However, as demonstrated in this study, many sentiment analysis libraries are inadequate for classifying social media tweets9,16,49, particularly those related to the pandemic46,55. This work contributes by introducing a new, robust text sentiment classification technique. Furthermore, while various frameworks for sentiment analysis exist in the literature7, there remains a need for a comprehensive framework for performing bimodal sentiment analysis to aid researchers with similar interests. In this regard, our study lays the groundwork for such future efforts.

Nonetheless, sentiment analysis of social media has already been applied in a wide range of areas. Examples include analyzing public sentiment toward specific brands102 or products104, opinion mining105,106, predicting stock market movements107, aiding political predictions102,108,109, analyzing legal matters102, supporting e-learning initiatives110, assisting in disaster relief management, and assessing crowd reactions7, among others. In each of these applications, augmenting sentiment analysis with trend analysis through bimodal investigation opens up new avenues for exploration. This study can make a significant contribution to these areas.

Table 8 A qualitative comparison of related research studies according to several features such as approach, analysis scope, sentiment analysis tool and shifts, data collection, and duration of analysis, etc.

Comparison with other similar research studies

Our study’s analysis method expands on several earlier research studies that were conducted during earlier disease epidemics54,55. Our study is distinct from other COVID-19-related Twitter or social media data analysis methods in that it includes all COVID-19-related hashtags instead of just one specific COVID-19-related topic34,44,45. Our research has similarities to previous research where social media photographs were employed as research subjects for public health issues69. The distinctions we point out here are based on the fact that we specifically selected COVID-19-related images from Twitter, expanding to the previously described study’s prior work. Moreover, the shift from clinical to broader socio-economic and political discourses observed in22 on Chinese social media contrasts with our focused hashtag-based approach.

Comparing the outcomes of this study with similar research further supports our findings and observations. For instance, a surge in negative text sentiment is discussed in16, which presents findings that align with part of our observations. However, our analysis provides a more comprehensive view, including two additional surges in April and June, offering a fuller picture. The study in26, which developed a model using Graph Convolutional Networks (GCN) and ensemble learning to detect multimodal negative sentiment, provides further validation for cross-platform applicability. Another study47 concludes that Facebook images related to COVID-19 mostly cluster around neutral sentiment. Although this work focuses on a different platform, its findings are comparable to ours. Therefore, while our study is based solely on Twitter data, the insights we present may also be relevant to other social media platforms.

Another study9 examines COVID-19-related text tweets from the second week of April and finds a higher percentage of neutral sentiment (45% - 53%). This observation aligns closely with our findings during the same period (50% - 55%). Additionally, the decline in positive text sentiment noted in that study is also visible in our results (see Fig. 5a) around April 2020. Another research on tourism-related tweets48, spanning from April to December 2020, reports a higher percentage of positive sentiment in the text. Similarly, another study60 detects increased expressions of suicidal ideation and emotional distress in Thai social media, which parallels the rising emotional intensity we observed during similar periods.

This finding could potentially explain the steadily rising trend of positive sentiment observed in our study (Fig. 5a) after April 2020. The study by50, however, suggests that optimistic, sarcastic, and joking tweets dominated during May to July, with a much lower ratio of negative sentiments. This finding may align with our results, as our sentiment classifier likely classified the optimistic tweets as positive.

Our conclusions differ slightly from the findings of12. This study reports that the proportions of neutral and negative tweets remained relatively high (approximately 35% to 45%) in the first week of March 2020, before the WHO declared COVID-19 as a pandemic. While our observations for neutral sentiment were similar, the negative sentiment was not as high. According to our findings, neutral sentiment remained elevated throughout that period (around 55% that week), while negative sentiment was relatively low at the time (around 25%) and began to rise steadily after the announcement.

Finally, we compare our sentiment trends with those observed in other global events. For instance, the study in111 indicates that negative sentiment in tweets increased significantly after the onset of the Russia-Ukraine conflict. Another study112, examined stock returns in the volatile Zimbabwean market from February 2019 to June 2020 and found strong coherence between Twitter sentiments and stock price sentiments. The study also reported a significant decline in positive sentiment from July 2019 to October 2019, as well as in March 2020, which aligns with our findings. For a more detailed comparison, Table 8 presents a qualitative analysis of related research studies covering different aspects.

Avenues of future work

Our study focused on analyzing the trends of positive, negative, and neutral sentiments, providing evidence of distinct bimodal correlation patterns during the COVID-19 outbreak. While we attempted to explain the major sentiment shifts based on existing literature, many of the smaller changes could also be explained through topic modeling applied to specific time segments of the tweets. Incorporating additional metadata, such as geographic location, user background, or contemporaneous event data, would offer a broader context for understanding the changes in sentiment trends. We used Twitter as our data source, which may be a limiting factor, as this social media platform is not universally popular across all countries. While Twitter is a rich source of information, incorporating additional platforms like Facebook, Instagram, and Reddit is a common practice among researchers113,114 working on similar problems. It would enhance the comprehensiveness and generalizability of the findings.

Initially, we applied image captioning using the Inception V3 model to gather context from the images. However, the generated captions were largely inaccurate and, therefore, not usable for our analysis. Clustering similar images or generating metadata, as discussed in49, could enhance the qualitative analysis presented in this study. Furthermore, causal analysis of sentiment trends expressed through images would offer additional insights in future research. On the other hand, the reliance on the T4SA dataset for image sentiment classification limits the adaptability of the analysis to the COVID-19 context. Employing more advanced and versatile multimodal models, such as CLIP, with fine-tuning or retraining could significantly improve the accuracy and relevance of image sentiment analysis.

Some tweets may contain conflicting sentiments between the text and images. In certain cases, tweet images might also include text. Analyzing such multi-modal tweets could open up new research avenues. Also, the paper uses a small, manually labeled dataset (500 samples) for validating the sentiment classifiers, which may not be sufficient to ensure robust performance. Incorporating advanced large language models like GPT-4 for sentiment classification could provide a more efficient and accurate analysis, given their proven capabilities in natural language understanding.

Many related studies have employed Natural Language Processing (NLP) techniques, such as Latent Dirichlet Allocation (LDA) topic modeling, named-entity recognition, summarization, and others, to analyze collected tweets. This type of analysis can provide insights into what people are discussing and offer a deeper understanding of why sentiment trends evolved as they did. Additional NLP techniques, such as N-gram modeling and summarization, could also yield valuable insights from the tweets.

It is also worth noting that some tweets are geo-tagged, which allows for the extraction of location data from the collected tweets, if available. Even when location data is absent, the author’s location can often be inferred from the tweet’s time zone. Grouping tweets based on geographic location can help identify notable patterns in sentiment trends. A spatial analysis of sentiments in tweets is an area that we plan to explore in future work.

Conclusion

In our study, we perform a comprehensive exploration to analyze human sentiments over cyberspace during a pandemic with the necessary context from the previous year. Observing sentiment changes to asses the present condition is a well-known technique, especially in E-commerce and among public health associations. Using the previous year’s data as a context to compare the changes, although common in other studies, is a unique method to COVID-19 sentiment analysis and is the first of such work to the best of our knowledge.

Furthermore, the human interactions with COVID-19-related data and visualizations uncovered in this study provide valuable insights into mass responses during a pandemic on a large scale, capturing overall human sentiments both during and after the crisis. Additionally, we propose an enhanced heuristic for sentiment analysis in text-based tweets.

Moreover, we introduce a novel bimodal analysis framework that examines human sentiments expressed through both text and images. Our study identifies shifts in sentiment within textual and visual content during the COVID-19 pandemic and quantifies these changes using a coefficient-based approach. We hope that the insights gained from our analysis will contribute to a deeper understanding of crises like pandemics and help guide future research during national and international emergencies.