Introduction

Drought is a naturally occurring disaster and begins with persistent precipitation deficits (meteorological drought) and is propagated into soil moisture deficits (agricultural) and extreme low flow condition (hydrological) (Van Loon, 2015). Current drought monitoring systems have monitored a range of physical conditions of droughts, including the deficits of precipitation (meteorological), soil moisture (agricultural), and streamflow (hydrological drought) (Hao et al., 2017). Socioeconomic drought is defined as a drought that can cause not only catastrophic economic losses but also psychological degradation of the public (Stain et al., 2011; Favero and Sarriera, 2012; Abunyewah et al., 2024; Parida et al., 2018). The repercussion of drought on agriculture manifests in production losses and groundwater scarcity (Mieno et al., 2024). Farmers those who experience drought-driven crop failure suffer from degrading of mental health (Hanigan et al., 2012), eventually causing the increased suicide rate (Lee et al., 2020). Socio-psychological impact assessments of drought have often focused on local stakeholders and conducted through surveys/interviews and social modeling for theoretical/ideal societies, which has provided limited understanding of how real communities communicate the risk of an emerging drought.

The concept of “human social sensor” has been proposed to advance our understanding of how human social cognition interacts with social environments through Big Data from web search engines and social media (Galesic et al., 2021). This digital trace data has provided opportunities to investigate changes of social behavior patterns within coupled physical-social systems through the emergence of natural disasters such as drought (Kim et al., 2019; Ahmad and Kam, 2024), flood (Du et al., 2017, Han et al., 2022), and heatwave (Zander et al., 2023). During the 2011-16 California drought, a surge of news articles about the governor’s declaration on drought emergency leads to the peak of internet search activities related to drought (Quesnel and Ajami, 2017; Kam et al., 2019). Furthermore, sentimental alteration of social media users can be monitored from social media, which can serve as a rapid indicator of heatwaves, particularly in densely populated urban areas (Dzyuban et al., 2022). Multiple digital tracing data-based assessments of socioeconomic drought advance our limited understanding of interactions of attention in mass and social media with public interest in an emerging drought, which will provide insights into how to elevate the resilience of our communities to drought.

Recently, South Korea experienced a severe drought from long-lasting precipitation deficits. The drought occurred nationwide in early 2022 when a surge of online information seeking activities was reported and most reservoir lakes dried out below 30% of the maximum water storage volume (Park et al., 2023; Lee and Kam, 2024). In the late summer of 2022, Typhoon Hinnamnor brought intense rainfalls over the Central and Southeastern regions, which shrunk the drought over the Southwestern region (Park et al., 2023; Kim et al., 2024). In April 2023, the drought was the most severe drought since 1973, where emergency vehicles delivered drinking water to villages over six months (Jo, 2022). Intense rainfalls over the Southwestern region suddenly terminated the unprecedented drought in early May 2023 (Park et al., 2023). This multi-year drought provides a unique opportunity to understand dynamic changes of attention in from mass and social media of Korea where multiple digital tracing data are available.

In South Korea, NAVER is a dominant search platform reaching about 40 million users every day (NAVER Corporation, 2022). Over January 2022 through May 2023, NAVER dominated 60.6% of the search engine market in South Korea, followed by Google (29.2%) and Daum (4.8%) (https://internettrend.co.kr/trendForward.tsp). Among social media, KakaoTalk, an instant messaging app, had the largest users with 97.2% according to Korea media panel survey of 3000 participants, followed by YouTube (86.5%) and NAVER cafe (70.0%) in 2021 (Korea Press Foundation, 2021). Twitter/X was used by 21% of the respondents. Due to the data availability, this study uses the Twitter/X data that have reflected South Korean society’s high demand for rapid information dissemination, social influence, and public discussion space in the digital age (Yoo et al., 2014).

Here we delve into how changes in the characteristics of the 2022–23 South Korea drought (intensity and areal extent) affect attention in social and mass media via a multi-pronged approach that integrates multiple social monitoring data including internet search activity volume data, Twitter/X data, and news article headlines. We calculate a daily drought index, self-calibrating Effective Drought Index (scEDI) (see Methods) at the province level of South Korea, to monitor changes of the characteristics of the 2022–23 drought (Park et al., 2022). We utilize natural language processing to conduct a sentiment analysis the headlines of 15,458 news articles from 122 local media companies from NAVER and about 0.8 million tweets over during the 2022–23 Korea drought (January 2020 through May 2023). We furthermore assess dynamic patterns of online information search activities from the NAVER DataLab and Google Trends platforms to examine associations of public interest in the ongoing drought with attention in mass and social media.

Literature review

Drought is a slow-growing disaster compared with other natural disasters such as flood and earthquake, resulting in different socio-psychological impacts. Individuals tend to consume content aligned with their existing interests, attitudes, or perceived relevance (Stroud, 2008; Pariser, 2011), and avoid or overlook information they deem unimportant or emotional taxing, such as slow-developing environmental issues like droughts (Garrett, 2009; Iyengar and Hahn, 2009). Moreover, slow-developing environmental risks like droughts fail to break through because they lack immediate emotional hooks or urgency (Anderson and Huntington, 2017; Schäfer and Painter, 2021).

A substitution effect of news media and social media has been assessed particularly whereby audiences may rely on one medium (e.g., television or radio) in lieu of others (e.g., social media), particularly in local contexts or during specific types of events. Prior research suggests that media substitution can occur when users’ media consumption patterns shift based on accessibility, trust, or familiarity with different platforms. For instance, users often replace digital media rather than supplement traditional media use increasing (Taneja et al., 2012). The selection of media may differ across age groups and event types (Stroud, 2014; Nguyen, 2013). In the context of environmental risk communication, audiences often turn to local TV or radio for immediate and trusted information during disasters, while social media may serve different roles including peer-to-peer sharing, emotional expression, or amplification (Peacock et al., 2020; Houston et al., 2015).

While both traditional and social media serve functions in risk communication, their roles differ significantly. Traditional media (e.g., television, radio, newspapers) is often perceived by the public as more credible and authoritative during crises, especially in emergency situations requiring official information dissemination (Liu et al., 2011). In contrast, social media platforms offer speed, interactivity, and decentralized information sharing, making them particularly useful for real-time engagement and participatory risk planning (Houston et al., 2015). However, the rapid update cycle and algorithm-driven content delivery of social media can lead to fragmented attention, where critical information may be quickly buried under trending or entertainment-driven content. This dynamic poses challenges to maintaining sustained public awareness during slow-onset disasters such as droughts (Lorenz-Spreen et al., 2019).

Social media data can characterize the disaster across space and over time, which is applicable to provide real-time situational awareness (Wang et al., 2016). The real-time situational awareness is essential for adaptive and effective response plans. The functional framework for social media use in emergency management helps articulate how social media enables not just information sharing, but also resource coordination, volunteer mobilization, and the provision of social support, which are all critical elements for building robust disaster resilience beyond traditional governmental responses (Houston et al., 2015).

Social media interacts with mass media during the emergence of high-impact events such as natural disasters and pandemic. Recent research suggests that users’ media environments are increasingly curated by algorithms and personal preferences, which shape exposure toward emotionally engaging, visually intense, or politically polarizing content (Dubois and Blank, 2018; Eady et al., 2019; Boulianne et al., 2022). Selective exposure may cause individuals to ignore climate-related content unless it aligns with their existing beliefs or values—particularly when such issues are perceived as distant, gradual, or technically complex (Pearce et al., 2019; O’Neill and Nicholson-Cole, 2009).

Interactions between news and social media on risk communication is not in a single direction. The slow spread of drought makes the public difficult to notice the ongoing drought, resulting in lacking online search activity volumes until the drought condition is extremely severe (Kam et al., 2019). In fast-growing disasters like earthquake and flood, social media and news have a commutative effect on each other. For example, social media activities significantly increased the pure volume of broadcasted information to intensify awareness and provide timely update during heavy snow and rain events (Panagiotopoulous et al., 2016), which underpins how social media enhances early warning and public understanding, directly supporting preparedness and rapid response. The interactions between media can impact public perception and behavior, known as social amplification of risk frame (SARF) theory (Kasperson et al., 1988). According to SARF, social media acts as a powerful “amplification station”, where risk signals—both accurate and inaccurate—can be intensified or attenuated through psychological, social, and institutional processes.

Previous studies related to risk communication during disasters have tested in fast-growing disaster-specific contexts such as extreme precipitation, flood, and earthquake, but not slow-growing disaster-specific contexts. They also mainly focused on the social response from either web search engine (Kam et al., 2019), social media (Tang et al., 2015, Liu and Han, 2025) or news media (Henrique Lima Alencar et al., 2024; Mahmud and Osawa, 2025). In South Korea, attention during drought has been predominantly directed toward ‘water deficit’ and ‘water security and support’, while ‘economic damages’ and ‘environmental and sanitation impact’ have received comparatively less emphasis (Lee et al., 2022). The 2022–2023 South Korea drought has typical characteristics of slow-growing disasters. The drought slowly emerged, persisted with a nationwide coverage in the late 2022, and then intensified centering on the southwestern region, a breadbasket of South Korea in the early 2023. These complex spatiotemporal evolutions of the 2022–23 drought can provide an opportunity to test and extend the existing theories for drought-specific contexts, including the situational crisis communication theory (Coombs, 2007), SARF theory (Kasperson et al., 1988), information diet theory (Johnson, 2012), selective exposure theory (Knobloch-Westerwick, 2014) and media substitute theory (Kaye and Johnson, 2003).

Methods

Data

In this study, daily precipitation records are aggregated at the province level from 103 Korean Meteorological Administration (KMA) stations in South Korea over 1980–2023 (https://data.kma.go.kr; see the province names and acronyms in Supplementary Table 1). Weekly and daily online information search activity volume data (2022–2023) are retrieved from Google Trends (https://trends.google.com/trends/) using a disaster type term, ‘Drought’, and NAVER DataLab (https://datalab.naver.com/) using a search term, ‘Drought’ in Korean (가뭄). The online search activity volume is a relative volume to the maximum activity volume during the chosen search period, ranging from 0 to 100. It is worth noting that the Google Trends platform has a disaster type option when a search term is related to disaster like drought and earthquake. This option provides more relevant internet search activities to disaster of interest (herein, drought) from different languages. Search activity volume data has been used as a proxy of public awareness/interest in a disaster of interest such as drought and earthquake because online search activity volume data is passively monitored and collected (Quesnel and Ajami, 2017; Kam et al., 2019; Gizzi et al., 2020). It is worth noting that Google Trends and NAVER DataLab provide no official documentation for their relative online search activity volume calculation.

Google Trends represents user search records from Google Search, Google News, and YouTube. NAVER DataLab reflects the searching frequency using NAVER searching engine. Both Google and Naver are widely used in Korea, but their user bases are not identical. NAVER is the dominant domestic search engine and best reflects the mainstream information-seeking patterns of the public and provides more extensive and timely suggestions Google is widely used by younger and internationally oriented users and captures more globalized search activity and shows different patterns, particularly in English or internationally oriented queries. By combining both sources, we evaluate the consistency of public information seeking activity patterns during the 2022–23 drought.

This study harnesses multiple digital tracing data. The objective drought severity is characterized by the precipitation-based drought index, scEDI. The Google Trends and NAVER DataLab data trace public information demand over the study period. News headlines monitor public information framing. Twitter/X embeds the amplification of social risk through two-way risk communications. This study aims to answer the following questions: What temporal relationships can be observed between public search behavior, news coverage, and social media activity during the 2022–23 South Korea drought? What measurable features define drought-related risk communication in news outlets and social media, and how can these be systematically evaluated? Do news outlets and social media exhibit substitutive or complementary effects in their influence on public attention and discourse throughout the 2022–23 South Korea drought? How does the drought severity/extent affect dominant sentiments in risk communication patterns? Are these sentiments different from news headlines and tweets? The news and online search activity data can test their complimentary relationship during the 2022–23 drought. Temporal analysis of news/online search activity volumes with the Twitter/X data can monitor the social risk amplification, which can be indicated by lead/lagged correlation between news/online search activity volumes and tweet volumes. By answering these questions, this study will provide insights into how to enhance public situational awareness through interactions among local news outlets and social media and improve the efficiency of drought contingency plans through public support.

In this study, 74,042 news articles from 798 federal and local media companies are collected using a search term, ‘Drought’ in Korean from January 2020 through May 2023. This period is chosen because 2020 is a pluvial year, which is free from the impact of an antecedent drought, and the 2022–23 drought was terminated in early May 2023. In South Korea, news media companies can be categorized into two groups: national media companies that cover the national and local socioeconomic issues and local media companies that primarily focus on the local issues (Special Act on Assistance in Development of Local Newspapers, Article 2; www.law.go.kr). Here, we focus on local news outlets that covers primarily the issues occurred in the province. These news articles data are accessible from NAVER news (https://news.naver.com), two Python packages ‘BeautifulSoup’ (https://www.crummy.com/software/BeautifulSoup/) and ‘Selenium’ (https://selenium.dev) for web scrapping and browser automation. The news article data include the name of a news media company, publication date, headline, and content. This study focuses on sentiment analysis of the news headlines, instead of the news contents since news headlines tend to be subjective, which makes large variations of emotion scores. News media companies can be categorized into two groups: national media companies that cover the national and local socioeconomic issues and local media companies that primarily focus on the local issues. Lastly, 15,458 news articles from 122 local media companies are used to construct province-level emotion maps. Local news agencies are aggregated into 16 provinces of South Korea (see Supplementary Table 1).

Here, 799,284 Twitter/X posts are collected using a keyword ‘Drought’ in Korean from January 2020 through May 2023. The Twitter/X data includes user ID, timestamp, content, retweet history, emojis and web links. Most of the collected Twitter/X posts have no geotag information. In the sentiment analysis, web links are excluded. In addition, emojis are removed because they may be used in various meanings, depending on the context. For example, the crying emoji can be sadness or joy (e.g., ‘tears of joy’), which requires a sophisticated method to determine the exact meaning of emoji considering the context. Some Twitter/X posts recur frequently with identical entertainment related content from different IDs. In this study, they are excluded from the sentiment analysis. High-frequency posts unrelated to drought, such as those associated with the keywords ‘BlackPink’, ‘BTS’, ‘EXO’, and other Korean pop artists or entertainment events, are manually identified and removed. Lastly, this study conducts the sentiment analysis of 769,228 Twitter/X posts that were directly related to drought. To further assess the validity of the data, we randomly select three samples of 1000 posts each and manually annotate them, identifying 702, 732, and 748 posts are drought-related respectively (see Annotations in Supplementary Materials). Most of the irrelevant posts use the word “drought” metaphor in the context of missing/rare.

Calculation of drought index

Here we calculate self-calibrating Effective Drought Index (scEDI) to detect and characterize daily evolutions of the 2022–23 drought in South Korea (Park et al., 2022). Among drought indices, scEDI is a drought index that is designed to monitor and characterize the condition of an emerging drought at a daily scale, calibrating the reference climatology itself from the preceding 30-year moving window. Based on scEDI, the drought condition categorized into three levels: moderate (−0.5 ≥ scEDI > −1.5), severe (−1.5 ≥ scEDI > −2.5), and extreme (−2.5 ≥ scEDI) (Park et al., 2022). It is worth noting that scEDI accounts for propagations of soil moisture from the the Global Land Data Assimilation System (GLDAS; Rodell et al., 2004) product and total terrestrial water thickness deficits from the Gravity Recovery and Climate Experiment (GRACE; Landerer and Swenson, 2012) satellite data due to the accumulation of daily precipitation deficit while it is calculated based on daily precipitation data (see Fig. S4 from Park et al. (2022), which is useful for those regions where the long-term daily precipitation data, but not soil moisture and hydrologic data, to monitor agricultural and hydrologic droughts.

In this study, scEDI is calculated using daily precipitation for 16 provinces of South Korea. The intensity of the 2022–23 drought is severe in mid-May of 2022 with the peak intensity (scEDI of −1.9) on 21 May 2022. In this study, two periods are focused during the 2022–23 drought since we find surges of negative news headlines and negative Twitter/X posts during these periods. The first period is defined as the national drought conditions from May through June 2022 (“I” in Fig. 1), which is ended when the central Korea region experiences the drought recovery. We define the second period as the local drought conditions from February through March 2023 (“II” in Fig. 1) because the Twitter/X posts with negative dominant emotion types are few in April and May 2023 when most provinces are recovered from the drought condition with the scEDI values of −0.5 or above in early May.

Fig. 1: Spatiotemporal characteristics of the 2022-23 South Korea drought.
Fig. 1: Spatiotemporal characteristics of the 2022-23 South Korea drought.
Full size image

Province-level monthly scEDI values (a) and the spatial distribution of scEDI score under the national (b) and local drought conditions (c). (I) and (II) in (a) depict the national and local drought periods, respectively.

Sentiment analysis

Here we employ Korean comments Efficiency Learning an Encoder that Classifies Token Replacements Accurately (KcELECTRA; https://github.com/Beomi/KcELECTRA) model, which was finetuned on Korean Online That-gul Emotions (KOTE) Dataset for sentiment analysis of Twitter/X posts and NAVER news headlines (Jeon et al., 2024). The ‘KcELECTRA (Korean comments ELECTRA)’ was trained from 50k KOTE data which are collected from 12 different platforms and various domains including news, social media, video platform. The KcELECTRA model is a pretrained Efficiency Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) with more than 18 million Korean texts. This study follows the hyperparameters of fine-tuned KcELECTRA-KOTE model provided in Jeon et al. (2024). During the inference, the sigmoid function was applied to the model outputs to obtain the probability scores for classification. The batch size of The KcELECTRA model was 32, and the maximum size of input token was 512. It is worth noting that each tweet post was limited to 280 characters until the early 2023 and the news headlines utilized in this study are shorter than the maximum input length of 512 characters.

The KcELECTRA model finetuned by KOTE also showed a better performance than the model fine-tuned with the translated GoEmotions dataset and other Korean NLP models using a different language dataset, Naver Sentiment Movie Corpus (NSMC) dataset (https://github.com/e9t/nsmc). The The KcELECTRA model tuned by the KOTE dataset has a wider range of label dimensions than other Korean emotion datasets, such as translated GoEmotions (27 emotion types), which is beneficial in monitoring and characterizing the sentiment/emotion from texts. KcELECTRA provides the scores of 44 emotion types, consisting of 14 positive (i.e., ‘Expectancy’, ‘Admiration’, ‘Excitement’, etc.), 25 negative (i.e., ‘Sadness’, ‘Dissatisfaction’, ‘Embarrassment’, etc.), 4 neutral (‘Arrogance’, ‘Surprise’, ‘Realization’, ‘Resolute’), and no emotion types, followed by the classification used in Jeon et al. (2024).

In this study, KcELECTRA is employed to each tweet from Twitter/X posts and news headlines. KcELECTRA provides the scores of the 44 emotion types of a content, which can cause a multi-label classification problem, as each sentence could exhibit more than one emotion label simultaneously. In this study, polarity (negative/positive) of news headlines and tweets is determined by whether the dominant emotion type is in a negative emotion group or not. Moreover, a dominant emotion type of news headlines and tweets is determined based on its highest emotion score. It is worth noting that the results from the first dominant emotion type are consistent with those from the first two dominant emotion types, which confirms the representativeness and robustness of the single dominant emotion-focused analysis.

Emotion scores from news headlines and Twitter/X posts are randomly sampled with replacement 1000 times (that is, bootstrapping) at the national level to construct the uncertainty of the percentage of the dominant emotion types of the corresponding tweets and news headlines during the national and local drought conditions. To assess the impact of sampling methods, we also conduct the stratified sampling of news headlines by the province during the national and local drought conditions. It is worth noting that Twitter/X provides no personal information if users opt out of sharing their personal information, which makes us difficult to assess the impact of sampling decisions on tweets.

Province areas of the geographical map can distort social responses through web search engine, news media, and social media, particularly in densely populated areas such as metropolitan cities. For visualization, we display the polarity of news headlines and Twitter/X posts on province-weighted map and the spatial distribution of scEDI values on the geographical map. We create this province-weighted Korean map based on the number of districts (e.g., si/gun/gu) within the corresponding province (e.g., metropolitan cities/administrative districts) (Supplementary Fig. 1)

Result

Interplay of the 2022–23 drought with local news outlets and social media

The 2022–23 South Korea drought follows the pluvial years of 2020–21. From August 2020, most provinces of South Korea show a positive scEDI value. In August 2020, intense rainfalls hit the Southern cities and provinces, including Busan (BS), Gwangju (GJ), and Jeollabuk-do (JB), where the scEDI value is above +2.0 (green colors in Fig. 1a; see the full names/acronyms of provinces and cities from Supplementary Table 1). In July 2021, the scEDI values decrease gradually, and most cities and provinces of South Korea show scEDI of −1.0 (moderate drought) or below in May and June of 2022 (dark purple colors in Fig. 1a, b), indicating the occurrence of a national drought (the national drought period; “I” in Fig. 1a). The result confirms that intense rainfalls from Typhoon Hinnamnor in late August 2022 terminate the drought over the central Korean region (positive scEDI values). The drought persists over the southern Korea region with rainfalls that are not sufficient for drought recovery, deteriorating the drought condition until April 2023. Here the local drought period is from February through March 2023 (“II” in Fig. 1a).

Negative news headlines and tweets increase when the scEDI values are negative (Fig. 2a). Overall, negative posts of Twitter/X are dominant during the national drought period. About 1000 tweets are posted in August and November 2022 after the peak in June 2022, which is coincident with an increase of negative news headlines. The total number of negative posts of Twitter/X is over 180,000 (76% of the total tweets) during the national drought period. The daily maximum number of negative tweets is about 39,000 (93%) on 17 June 2022, and 35 news (27%) headlines are negative on 7 June 2022. During the local drought period, positive and negative news headlines are almost equal (55% (357) and 45% (294), respectively). Negative news headlines are spread mainly by local news outlets in the southwestern Korea region without significant increase of negative tweets. The cross-correlation analysis between scEDI and news headlines (Twitter/X posts) shows that the strongest correlation between scEDI and news articles is observed at a 17-day (26-day) lag (r = −0.381, p < 0.001) (Supplementary Fig. 2). In the cross-correlation between news articles and tweets, the cross-correlation coefficients of news articles with the Twitter /X posts are high in the three- (r = 0.439, p < 0.001) and nine-lead days (r = 0.414, p < 0.001).

Fig. 2: Spatiotemporal variations of attention from Twitter/X and news media in the 2022–23 Korea drought.
Fig. 2: Spatiotemporal variations of attention from Twitter/X and news media in the 2022–23 Korea drought.
Full size image

a Numbers of monthly total number of tweets at the national level and news headlines with a dominant sentiment at the province level are shown by colored dots. In (b) and (c), spatial distribution of the numbers of news headlines with a province-level dominant emotion type as positive (the sum of 14 positive emotion types) or negative (the sums of 25 emotion types) under the national and local drought conditions ((I) and (II), respectively).

The spatiotemporal patterns of emotion in the news headlines are various across the 16 cities and provinces between the national and local drought conditions (Fig. 2b, c). Negative news headlines are dominant in Chungcheongbuk-do (CB), Seoul (SU), Gwangju (GJ), and Jeollanam-do (JN) under the national drought conditions. Negative news headlines remain in GJ and JN during the local drought period. Local media companies in GJ increase negative headlines of their news articles twice (49 to 92 news articles) when the local drought condition is severe. When the drought is recovered in early May 2023, local media companies in GJ and CB release news articles with a positive headline. These results indicate that some local media companies keep an eye on the drought condition even after the drought termination.

Before the national drought conditions (March 2022), the headlines of 25 news articles are negative with no increase of tweets when almost all provinces of South Korea are close to scEDI of −0.5 (Fig. 3). This result indicates that some of local mass media companies report the current drought condition as an early drought warning message but no reactions of Twitter/X users. These news articles are released by local media companies in drought onset cities and provinces including Seoul (SU), and Gwangju (GJ), Gyeongsangbuk-do (GB), and Gyeongsangnam-do (GN). When the drought is nationwide in June 2022, the sentiment of the news headline is different across the provinces (Fig. 2b). Local media companies in Chungcheongbuk-do (CB), Jeollanam-do (JN) release negative news headlines while positive news headlines are spread by local media companies in Gyeongsangbuk-do (GB), Gyeongsangnam-do (GN), Daegu (DG), and Daejeon (DJ). Local mass media companies in SU and GJ keep news headlines negative.

Fig. 3: Temporal variations of sentiment in tweets and news headlines.
Fig. 3: Temporal variations of sentiment in tweets and news headlines.
Full size image

Black (green) lines in (a) and (b) depict daily numbers of negative and positive tweets (news headlines) from March 2022 through May 2023.

Interplay of public search behavior with media attention

Results show that the Google Trends and NAVER DataLab data are overall consistent with each other (Fig. 4a). The total online search activity volumes from the Google Trends and NAVER DataLab data are 3539 and 3898, respectively, from January 2022 through June 2023. Both data demonstrate that relevant online search activity volumes increase in early May 2022 reaching the peak in late June 2022 and the second peak of relevant online search activities is in early April 2023. News articles show an abrupt increase in June 2022 with the peaks of 129 and 135 news articles on 7 and 14 June 2022, respectively, which is in the same week with the occurrence of the maximum online search activity volume.

Fig. 4: Associations of Twitter/X posts and NAVER news headlines with online search activities during the 2022–23 Korea drought.
Fig. 4: Associations of Twitter/X posts and NAVER news headlines with online search activities during the 2022–23 Korea drought.
Full size image

a Weekly and daily online search activity volumes from Google Trends (white bars) and NAVER DataLab (green line). b Numbers of daily total posts of X/Twitter (black line) and news articles (green line). Inlet graph in (b) depicts the numbers of daily total Twitter/X posts and NAVER news headlines during the national drought period ((I)). Circles in (b) depict the dates of the drought countermeasure subsidies issue and landfall of Typhoon Hinnamnor, and the recovery of the 2022–23 drought, respectively.

The number of Twitter/X posts is 43,664 on 16 June 2022 (Fig. 4b). A surge of Twitter/X posts about drought follows the first (second) peak of news articles nine (two) days later. The timing of a surge of news articles and posts of Twitter/X is consistent with those of relevant online search activity volumes indicating possible interactions between local news outlets and social media through online information seeking and sharing activities. According to the annual report of National Drought Information Statistics (Ministry of Interior and Safety, 2024), governmental countermeasures with the grants of special subsidy tax for drought are released on 7 June 2022, which might trigger the peak of drought-related news.

On 16 June 2022, about 30% of tweets address the schedule of water-spraying concert ticketing on the same date, which is a controversial topic during the ongoing drought as the drinking water usage amounted to 300 tons for the concert in a day. When the drought extent shrinks centering on the southwestern region, the local media companies release news articles about the ongoing drought and relevant online search activity volumes are high. However, relevant posts of Twitter/X activity show no increase correspondingly.

Disparity of sentiment in news outlets and social media

While negative posts of Twitter/X showed different dominant emotion types during the national and local drought conditions, the dominant emotion types of news headlines are consistent (Fig. 5). During the national and local drought periods, the news headlines show three major emotion types, Expectancy (36.8% vs 44.3%), Anxiety (29.7% vs 23.0%) and Disappointment (10% vs 10.7%). The percentages of the corresponding news articles are over 80% of the total news articles drought spatial extent. During the national drought period, the dominant emotion types in tweets are Disappointment (15.7%), Dissatisfaction (14.3%), and Preposterous (13.5%). During the local drought period, the dominant emotion types in tweets are Anxiety (23.2%), Disappointment (13.5%), and Expectancy (12.2%). The detailed proportion of emotion types are provided in Supplementary Table S2. To assess the impact of bootstrapping sampling methods, we further conduct the stratified sampling to estimate the uncertainty range of emotion scores of new headlines by province under the national and local drought conditions. Under the national drought conditions, news headlines with ‘Expectancy’, ‘Anxiety’, ‘Disappointment’ are 36.9%, 29.7%, and 10%, respectively (95th percentile range: 34.1–39.8%, 27.2–32.3%, and 8.3–11.7%, respectively). During the local drought conditions, the dominant emotion types of ‘Expectancy’, ‘Anxiety’, and ‘Disappointment’ are 44.4% (40.9–47.8%), 23.2% (20–26.3%), and 10.7% (8.4–12.9%). The consistency between stratified and random sampling confirms the robustness of the bootstrapping method during national and local drought, respectively.

Fig. 5: Emotional changes in Twitter/X posts and NAVER news headlines during the 2022–23 Korea drought.
Fig. 5: Emotional changes in Twitter/X posts and NAVER news headlines during the 2022–23 Korea drought.
Full size image

Top three dominant emotion types and percentages in the posts of Twitter/X (a) and the news headlines (b) under the national and local drought conditions ((I) and (II), respectively). Numbers in the square bracket depict the lower and upper bound of the 95th percentile range of 1000 bootstrapping samples (randomly sampling with replacement).

A further text analysis demonstrates that the top 10 most frequent words in tweets and news headlines are different before and during the national drought period (Fig. 6). Seven out of the ten most frequent words in tweets are common before and during the national drought period. Most of the 10 most frequent words in news headlines are different during the national and local drought conditions. During the national drought period, the words from tweets are related to individual circumstances related to drought while those from news headlines are related to management and governmental countermeasure. We further analyze the vocabulary differences in the two sets of data under different emotion during the national drought (Supplementary Table 2). Although the high-frequency words are mostly the same, specific words under dominant emotions provide a clear understanding of different expressions in the same emotions in social and news media. The high-frequency words in NAVER news and Twitter/X are similar, focusing on words such as ‘drought’, ‘water’, and ‘rain’. News media indirectly expressed the emotion of ‘Anxiety’ by reporting the impacts, using terms such as ‘deficit’ or ‘region’. In contrast, social media express ‘Anxiety’ more directly, using terms like ‘Worry’. The ‘Expectation’ from Twitter/X indicates that the public is looking for countermeasures and solutions, with the expression of ‘Resolution’ and ‘Remedy’. These results suggest a different role/purpose of news media and social media in risk communication.

Fig. 6: Top 10 most frequent words in tweets and news headlines before and after the 2022–23 drought.
Fig. 6: Top 10 most frequent words in tweets and news headlines before and after the 2022–23 drought.
Full size image

Blue and red bars depict the top 10 most frequent words in Twitter/X (a) and NAVER News (b) before (August through September2020) and under the national drought conditions (May through June 2022), respectively.

Discussion

This study is a case study of a recent South Korea drought focusing on associated social impacts on alteration of risk communication patterns in mass and social media utilizing natural language processes and multiple digital tracing data. This study demonstrates that digital trace data is an untapped resource to understand the characteristics of socioeconomic drought, which is the least understood drought type. We find that negative news headlines increase when the drought develops slowly, indicating that local news outlets keep an eye on the condition of an emerging drought and send readers an early drought warning message. Conversely, relevant online search activity volume and Twitter/X posts are low, indicating lacking situational awareness/perceived risk of the ongoing drought. This finding supports the selective exposure and selective exposure theories (Johnson, 2012; Knobloch-Westerwick, 2014).

This study demonstrates strong associations of timing of the peak of attention in mass media and public interest in the ongoing nationwide drought conditions. Results demonstrate that the online search activity volume is the maximum in the same week when the number of news articles is highest, indicating the complementary effect between news media and public attention. This study finds that a surge of Twitter/X posts about drought follows the first peak of news articles/information seeking activities nine days later. Based on the results, we propose a proposition that the complementary relationship of attentions from news media and public would elevate the public’s situational awareness (i.e., amplification of social risk) and motivate them to get actively involved in risk communications through social media like Twitter/X with a certain gap. Further investigations are required to test the validity of the proposed mechanism on the interplay of news media, social media, and public search behavior through questionnaire survey-based further studies.

Additionally, this study shows social media silence alongside active local media engagement before the nationwide drought conditions and during the localized drought conditions. This result implied that local media may serve as an early warning system consistently regardless of the drought extent/severity while social media lags until the drought conditions are severe/nationwide. During the localized conditions, social media silence can support the “manifestation of media substitution theory in localized crises (i.e., the public tends to rely more on authoritative local media rather than social platforms; Muhammed et al. (2022)), which requires further investigations.

This study shows diverse patterns of risk communication in web engines and media. Results show that attention of the public and local mass media is high even when the drought condition is locally intensified. Tweets however show no discourse relevant to the local drought. The results imply that local mass media companies are an information provider about the current drought condition and associated contingency plans including special subsidy tax for drought. High volumes of online search activity by the public indicate a strong demand for relevant information about the drought conditions and contingency plans from those seeking ways to mitigate the adverse effects of the ongoing drought. Social media is sensitive to the drought spatial extent.

This study also finds that the dominant emotion types are consistent, such as Expectancy, Anxiety, Disappointment, in news headlines, while Twitter’s dominant emotions shift. This disparity in the risk communication patterns are originated from different environments of news and social media. News media has strong editorial gatekeeping through institutional filters (i.e., verification, editing, legal/ethical oversight) while social media is filtered algorithmically (that is, a weak editorial control). News media is informational and responsibility-oriented while social media is expressive and relational prioritizing immediacy and visibility. These different environmental factors may cause the disparity in sentiment over the recent multi-year South Korea.

However, this study poses limitations and challenges. Firstly, Big Data in web search engine, news and social media have uncertainties. In this study, data preprocessing is conducted to filter irrelevant Twitter/X posts before the sentiment analysis. The Twitter/X data contains substantial irrelevant information even with specific keywords which can submerge the voices that speak for real natural drought and interfere with the results (Dzyuban et al., 2022). While the data preprocessing procedure is conducted, there is still a chance to include irrelevant tweets in our sentiment analysis, but the emergence of a severe drought introduces a surge of tweets, which smooth out the tendency of irrelevant tweets.

Secondly, a spatial comparison of sentiment and emotion of the news headlines and tweets is not conducted in this study because most tweets have no geotag information. Due to limited geographic information, the randomly sampled Twitter/X data may be biased towards densely populated metropolitan areas such as Seoul, possibly resulting in a representative bias that is reflective of the situation in metropolitan areas. This potential bias might imply that the absence of increased tweet volume may be attributed to low attention from outside the affected region, which might support the information diet/selective exposure theories. Conversely, increased internet search activity volumes may be attributed to high attention from inside and outside the affected region, possibly leading to more news reports on the ongoing drought. These results support the social amplification of risk frame theory in drought-specific contexts.

Another limitation comes from our assumption that the audience of local media limited to the perspective province, which is not in reality. To validate the spatial correlation between “provincial news emotions” and “local social response,” the primary information sources of the public are necessarily surveyed across the districts of South Korea. This survey will provide another opportunity to test of media substitute effect on the emotional/sentimental perspective in risk communication. In this study, the interaction sequence from news/public search to social media is inferred through lagged correlation. However, the third-party variable interference is included. For example, the drought emergency subsidy was issued by the South Korean government in June 2022, which may have triggered news reports, public searches, and social discourses. Moreover, the landfall of Typhoon Hinnamnor in August 2022 (Fig. 4b) can be play a role as the breakpoint of attentions from news media, public interest, and social media, which requires future investigation with a careful regression discontinuity analysis with additional data such as the policy implementation date and landfall date of Typhoon Hinnamnor.

Emotion scores from the KcELECTRA model used in this study are uncertain, facing a challenge in scoring the 44 emotion types given a relatively short paragraph like news headlines and tweets (Jeon et al., 2024). It is also challenging for current NLP packages to recognize the irony, slang, and ambiguity sentences (Abu-Salih et al., 2023). This study relies on a Korean-language model, KcELECTRA, which classifies the 44 emotion types into three categories: positive, negative, and neutral. This emotion classification is uncertain in text-based emotion analysis because some emotion types are difficult to distinguish, which is context-dependent the neutral emotion types such as surprise and realization are context-dependent and can manifest positively or negatively. This can versimplify emotion dynamics and may distort polarity assignment. Emotion scores from the KcELECTRA model used in this study are uncertain given a relatively short paragraph like news headlines and tweets (Jeon et al., 2024). It is also challenging for current NLP packages to recognize the irony, slang, and ambiguity sentences (Abu-Salih et al., 2023). However, we find that Twitter/X posts and news headline with the neutral emotion types account are a small proportion of the data used in this study (Arrogance (0.1%), Resolute (0.2%), and Realization (1%)), implying that these “neutral” emotion types play a minor role in drought risk communication.

Moreover, KcELECTRA is limitedly validated for cultural or linguistic biases in emotion detection in drought-specific contexts, which requires the design of “manual correction rules” to reduce sentiment misjudgment for high-frequency ambiguous expressions (e.g., “Finally it rained—but not enough”). Despite these limitations we addressed, this study indicates that Big Data and AI is still highly beneficial in monitoring sentimental changes in risk communication through mass and social media in near real-time and test the existing theories in risk communication at the aggregate level.

Conclusions

This study demonstrates interactions of the drought characteristics with local news outlets and social media during the 2022–23 South Korea. Local news outlets publish relevant articles and the public seeks related information online when the drought conditions are severe, regardless of their spatial extent. Twitter/X posts are most frequent during the national drought conditions but are relatively few during the local drought conditions. We find that the words from tweets are related to individual circumstances related to drought and those from news headlines are related to management and governmental countermeasure under the national drought conditions. Our results from sentiment analysis show that news headlines show three dominant emotion types of over 70% of the news headlines, including ‘Expectancy’, ‘Anxiety’, and ‘Disappointment’, regardless of the spatial extent, but the dominant emotion types of tweets are changed between the national and local conditions.

Climate change is expected to cause more frequent droughts and more intense rainfall over East Asia including South Korea. This study highlights a value of true cross- and trans-disciplinary collaboration, particularly across natural science (climate/atmospheric science), engineering (water resources engineering and AI), and social science (sociopsychology and linguistics) in revealing hidden dynamical patterns of risk communication, particularly for drought-specific contexts, which is still the deadliest natural disaster over the world. This study hints how AI can help manage drought risk in a timely manner and transform toward drought-ready communities through “human social sensing”.