Introduction

Cultural heritage-based tourism is regarded as one of the most significant and fastest growing areas within the industry (World Tourism Organization, 2015). The appearance of the coronavirus in 2019 (COVID-19) has caused global health and economic crises around the world. Among the industries affected by the pandemic in terms of economic losses, the tourism industry has been one of the hardest hit (Abbas et al., 2021; Gössling et al., 2020). The pandemic has substantially reshaped the global tourism landscape, transforming both the perceptions of risk and the behaviors of tourists and industry stakeholders alike (Cooper et al., 2022; Park et al., 2020; Rather, 2021; Zheng et al., 2020). From the perspective of the tourists, for example, health safety concerns and inconvenience during travel to and from sites, which are triggered by the pandemic, have created significant hesitation regarding travel decisions (Chan, 2021); concurrently, the pandemic shifts tourism industry stakeholders from viewing global tourists as ‘ambassadors’ to ‘undesired guests’ as the primary focus lies in implementing stringent preventive measures to minimize the transmission of the virus within destinations under the context of the pandemic (Cairns and Clemente, 2023; Çakmak et al., 2023; Erayman and Çağlar, 2022; Korstanje and George, 2021). Such changes in perceptions and behaviors have likely had a substantial impact on the visiting experience of cultural heritage-based tourism, and thus warrant further investigation.

Despite the COVID-19 pandemic being a global health crisis, geographical factors have contributed to variable impacts of COVID-19 (Read, 2022; Sigala, 2020). While the pandemic has had significant negative impacts on a global scale, which have been documented in several countries including India, Australia, and Malaysia (Flew et al., 2021; Foo et al., 2021; Jaipuria et al., 2021; Kumar et al., 2020; Sah et al., 2020), it should be noted that the impact of COVID-19 on the tourism industry varied between countries, particularly dependent on their level of economic development. Specifically, while developed economies have experienced significant losses in the tourism industry due to the pandemic, developing economies may have faced even more devastating consequences, regardless of the scale of loss of life due to COVID-19. This paradoxical phenomenon is due to the fact that some developing economies have become overly dependent on the tourism industry, making them more vulnerable to the negative secondary consequences of the pandemic, such as the closure of tourism-related businesses, job losses and interrupted transport and supply chains (Barbosa et al., 2021; Brunn et al., 2022; Ranasinghe et al., 2021).

The impact of the pandemic on heritage organizations and tourism has varied with geography and the characteristics of individual sites. For example, Vaishar and Šťastná (2022) argued that economic uncertainty and travel restrictions that limit international tourism during COVID-19 have caused severe negative impacts on urban tourism in the Czech Republic; however rural tourism that is predominantly focused on domestic tourists was less affected. Also, outdoor sites, such as open-air museums and parks and gardens, were more favorable to visitors due to perceived lower associated risks of COVID-19 transmission and thus likely to be more resilient during pandemic measures compared to indoor sites (Landry et al., 2021). In the UK, the closure of heritage sites during the peak season (i.e. Easter until September) where organizations earn up to 70% of their annual turnover, resulted in a difficult situation (UK Parliament, 2020) where 82% of heritage organizations reported high or moderate risk to their organization’s long-term viability (UK Parliament, 2020b). Concurrently, some researchers and professionals believe that the pandemic was an opportunity to reshape and reinvent the sector and its practices (Smith, 2021). During COVID-19 there was also a visible surge in digital tools for audience engagement (Samaroudi et al., 2020). Heritage organizations had to adopt a proactive approach where various stages of the pandemic required a distinct approach (Cui et al., 2023). This illustrates some of the management challenges the cultural organizations faced due to the pandemic.

Three years on from the initial outbreak of COVID-19 pandemic, although the long-term impact of COVID-19 is still unclear, this crisis is reaching a more mature, yet still threatening stage (Assaf et al., 2022). For instance, a UNESCO (2020) report estimated that about 10% of museums globally may never reopen. More recently, an Art Fund report (2022) found that the income and visitation levels have recently increased, however, they are still 68% and 61% of the pre-pandemic levels. Against this background, strategies to speed up the recovery of the tourism industry and prepare for the future should be proposed and adopted (Assaf et al., 2022; Korstanje et al., 2022). While the impact of the pandemic on the tourism industry in the UK may not be as devastating to the national economy as compared to the proportional impact in many developing economies, the tourism industry in the UK was severely affected by the pandemic (Office for National Statistics, 2021). Thus, it is still an important issue that requires attention since heritage tourism is a vital component of the UK economy that provides significant economic value and employment opportunities (Oxford Economics, 2016). This is highlighted by the statistics that, in 2019 (pre-COVID), the heritage sector in the UK provides a total gross value added of £36.6 billion and over 550,000 jobs (Historic England, 2020). Moreover, the abundance of online social media data pertaining to UK heritage sites generated during the pandemic provides a valuable opportunity to examine the broader social media responses towards disruptive events like COVID-19. The COVID-19 pandemic is therefore a suitable case study for demonstrating the effectiveness and limitations of the methodology proposed in this study that employs advanced machine learning models to effectively extract information from noisy social media data.

In this paper, we first quantify the impact of COVID-19 in terms of the number of visitors to heritage sites, using the number of comments from an online review platform as the proxy. Secondly, we investigate how the pandemic has affected different dimensions of the visitor experience. To that end, we apply sentiment analysis to measure visitor perceptions towards the impact of COVID-19. The negative impact caused by COVID-19 may arise from two causes, namely mental and physical health consequences, both due to being affected by the disease and inconvenience because of the imposed policies by local governments to stop the spread of the virus during the travel. In this paper, we focus on the latter aspects, namely the changes in experience resulting from non-pharmaceutical preventive measures related to COVID-19.

Our study aims to achieve two main goals: (1) To analyze impacts related to COVID-19 on visitor numbers and experiences at heritage sites in the UK. (2) To exemplify the efficacy of sophisticated machine learning techniques in interpreting visitor sentiments based on their social media comments, serving as an illuminating framework for potential future application in socio-cultural research. To accomplish this, we deploy a blend of deep learning-based weakly supervised natural language processing (NLP), zero-shot learning NLP, and computer vision (CV) models. We apply these tools to an expansive dataset of social media content, containing over 1 million review comments for over 750 heritage sites in the UK, to conduct an empirical evaluation of the COVID-19 pandemic’s effect on UK heritage-related tourism.

To our knowledge, this empirical study with state-of-the-art machine learning models on such an extensive dataset is both novel and unique. However, it is important to acknowledge the inherent challenges in this approach, and thus potential limitations:

  1. 1.

    Language ambiguity: complex, polysemous natural language can cause ambiguity in interpretation.

  2. 2.

    Uneven topic coverage: unstructured data, like user reviews, may show disproportionate topic representation.

  3. 3.

    Passively collected data: mining user reviews limits purposeful questioning compared to structured surveys.

More detailed discussions regarding the limitations can be found in the section “Limitations”. Despite these, this exploration yields valuable insights to facilitate the recovery of the relevant sectors from the pandemic’s repercussions, and provides evidence-based recommendations for effective site management strategies amid the ‘living with COVID-19’ policy, shading light on using such method to conduct relevant social studies in future research.

Background

Coronavirus disease (or COVID-19 for short) is an infectious disease caused by the SARS-CoV-2 virus. COVID-19 was declared a pandemic by the World Health Organization (WHO) on March 11, 2020, and has since spread globally. COVID-19 was confirmed to be spreading in the UK by the end of January 2020 with the first confirmed deaths in March of the same year. Since then, in the UK, it has resulted in more than 22 million confirmed cases since the original outbreak in the country and is associated with 211,684 deaths by the end of 2022 (UK Government, 2022).

To prevent the spread of the virus, the UK had three national lockdowns, in which people were required to “stay at home” as much as possible. Most public places were therefore affected by these lockdowns, although to varying degrees (i.e., closed or restricted to public visiting). The first national lockdown came into effect on 26 March 2020 and was gradually relaxed in June 2020. Measures, including social distancing and mandatory face coverings, were introduced after the first national lockdown. The second national lockdown was enforced for 4 weeks between 5 November 2020 to 2 December 2020, aiming to contain the outbreak and flatten the curve of cases during winter. Finally, the third national lockdown started on 6 January 2021. The exit of the third lockdown was gradual and multi-staged: on 8 March 2021 recreation in an outdoor public space was permitted between two people; on 29 March 2021 outdoor gatherings of either six people or two households were permitted, and on 12 April 2021, public buildings and outdoor venues, including museums, libraries, zoos, and theme parks were allowed to reopen. Finally, on 19 July 2021, the third national lockdown was ended. However, several measures such as face coverings requirements, were kept until February 2022 when the government’s plan for “living with COVID-19” was issued (UK Government, 2022b).

Alongside lockdown measures, there were several partial or full closures of heritage sites, which in many cases has continued following the end of lockdown measures. According to one survey (VisitEngland, 2021), over a third of attractions were unable to open for their typical visitor seasons once national lockdown measures were lifted. This was predominantly driven by continued restrictions due to regional lockdowns and difficulties in meeting the requirements to open safely during the pandemic, although profitability and a lack of staff volunteers were also important factors.

As previously mentioned, the UK government has adopted several measures since the outbreak of COVID-19 to curb the spread of the virus in public places during and after national lockdowns. Guidance for preventing the spread of the virus in public places was issued by the UK Government (2021), in which three measures were specifically mentioned:

  1. 1.

    Social distancing: encouraged the public to keep apart (2 m or over 1 m) from people they do not live within the public spaces.

  2. 2.

    Face coverings: encouraged the public to wear a face covering in crowded and enclosed spaces where they may come into contact with other people they do not normally meet.

  3. 3.

    The cleaning and hygiene: requested the owners and operators to implement cleaning protocols to limit coronavirus transmission in public places with a particular focus on touch points. Also, under cleaning and hygiene, except for the suggested use of cleaning protocols and posters, there was usually cleaning equipment such as hand gel provided.

These measures have been implemented for most of the coronavirus pandemic before they were withdrawn in February 2022. As the operating models of many heritage sites depend on visitors, these measures likely had a profound impact on visits during the pandemic.

Except for these general measures, there were also short-lived policies such as requiring an NHS COVID-19 passport, or evidencing proof of a negative test result within a certain recent timeframe for the entrance of public spaces. As these policies only hold for a short period of time (especially in the spike time of growing COVID-19 cases), they are unlikely to account for a significant portion of visitors’ opinions regarding COVID-19 during their visit. Therefore, in measuring visitors’ sentiment towards the impact of COVID-19 on their travel experiences, we selected social distancing, face coverings, hygiene (cleanness, provided equipment such as hand sanitizer) and restrictions (restricted or closed areas) in the heritage sites.

Many studies have leveraged social media data to analyze the impact of COVID-19 on different aspects of society. For example, Ginzarly and Srour (2022) examined discourse emerging from cultural heritage content shared online during the COVID-19 pandemic. Through analyzing hashtag data collected from Instagram using latent Dirichlet allocation (LDA) and sentiment analysis package (syuzhet), the paper concluded that positive topics of social values, including safety, inclusion, participation and resilience, and diverse cultural expressions were the most shared by the users during the COVID-19 pandemic. Their results also showed that users approach the virtual space as a substitute for the loss of physical access through terms like ‘home’, ‘virtual’, ‘online’, ‘travel tomorrow’, and ‘museums from home’.

In examining the impact of COVID-19 on Singapore, Ridhwan and Hargreaves (2021) used VADER, a Lexicon-based method, for sentiment analysis and a recurrent neural network model pre-trained on the emotion classification dataset for emotion analysis. The study used data collected from Twitter between 1 February 2020 and 31 August 2020. The results of the study reveal that nearly half (45%) of all tweets expressed joy, while 30% expressed fear. The topic of staying at home pertaining to COVID-19 was the dominant topic in the tweets. Public health topics mainly expressed positive sentiment, including topics of social distancing, the encouragement to stay at home and stay safe, as well as the wearing of face coverings, while travel and border restrictions caused by the pandemic situation were dominated by negative sentiments.

Sanders et al. (2021) used tweets collected from March to July 2020 to illustrate public attitudes toward the use of face coverings during the COVID-19 pandemic. The study performed clustering by applying a k-means algorithm in the embedded space to find semantically distinguished topics. Simultaneously, each tweet is labeled by a sentiment score using VADER. They also used a text summarization model to process each cluster (and its subclusters) using tweets at the centroid of each cluster and conducted a qualitative analysis based on the model’s outputs. The study found a consistently polarized Twitter discourse surrounding face coverings and an accompanying overall increase in negative sentimentality.

Lyu et al. (2021) also utilized Twitter data from 11 March 2020 to 31 January 2021 to investigate topics, sentiment and emotion expressed in COVID-19 vaccine-related content. LDA, syuzhet and a lexicon-based method are used to perform topic modeling, sentiment analysis and emotion analysis, respectively. The study found the topic about vaccination progress around the world was mostly discussed and was often driven by the key milestone steps in vaccination. The sentiment of vaccination was increasingly positive in general, and emotion analysis further showed that trust was the most predominant emotion expressed in tweets regarding vaccination.

These studies and the broader corpus of the literature suggest that visitor perception towards COVID-19 and associated measures can be understood by applying NLP techniques to data collected from social media. Methodologically, this paper follows this framework by adopting NLP methods to decode semantic information expressed in user comments towards COVID-19 measures implemented at UK heritage sites, and at the same time, advances the previous studies by applying state-of-the-art weakly supervised and zero-shot learning language models which are more capable of capturing semantic information from sentence contexts.

Data and method

This section will start by introducing the data included in this study. Afterward, we introduce the method used to categorize heritage sites into urban and rural and indoor and outdoor. This is followed by an introduction to the NLP models used to detect comments related to COVID-19. Then, we introduce how visitor involvement was measured, especially in terms of passenger flow, and based on this, how to estimate the degree of tourism recovery using the number of online reviews as the proxy. Lastly, sentiment analysis methods will be introduced to dig deeper into visitors’ sentiments towards different preventive measures used by heritage sites.

Data

This research aims to reveal the impact on visitor numbers and their experience of COVID-19 measures on UK heritage sites at the national level of the UK. Thus, we intended to include a large and diverse cohort of sites while ensuring that sufficient data was available to undertake robust analysis. Accordingly, the following inclusion criteria were set: (1) the site has received more than 100 reviews, to reduce the variance in estimation; (2) the site is primarily associated with cultural heritage and tourism; (3) places sharing very similar role to heritage sites in accounting for people’s leisure time (e.g., national park or London Eye). To find relevant sites, we included properties managed by large heritage organizations, including English Heritage, the National Trust, the National Trust for Scotland and Historic Environment Scotland. We also included relevant sites from compilation lists, including the most visited attractions in England in 2019 (published by VisitBritain.org), a list of ‘A History of England in 100 Places’ published by Historic England, and most visited museums in membership with Association of Leading Visitor Attractions (ALVA) in 2019. We collected review data for 775 sites from Google Maps using the specified criteria and sources, starting from February 2006 (the earliest available data on Google Maps for the included sites) until April 2022. We selected Google Maps because data from Google Maps is site-focused and each review is associated with a rating score given by the user, which can be used to complement the measurements of user sentiment.

We collected both textual comments data and imagery data from each site. For the collected textual review data, we excluded non-English data and non-text data (reviews that only contain ratings without textual comments), which resulted in ~1.4 million reviews included in the analyzed set. Each comment data has a corresponding user-given rating score, ranging from 1 (worst) to 5 (best). 100 photos were collected for each site, with each photo uploaded by a different user; we randomly choose a single photo if multiple photos were uploaded by a user. Table 1 shows the management organizations and sources of selected sites with the corresponding number of comments from different sources. Places categorized as ‘Other’ in the table are mostly abstract and do not belong to any organizations or lists, such as London’s Chinatown or the River Thames. However, as they possess similar significance to managed substantive heritage properties, they have been included in this study. More details regarding which sites are included can be found at https://zenodo.org/record/8130804 (Liu et al., 2023).

Table 1 Distribution of the number the selected sites and corresponding number of comments for the sites from these sources.

Outdoorness and urbanness

To quantify the within-variation of the impact of COVID-19 on the cultural heritage tourism industry, we classify sites in two ways: (1) indoor and outdoor and (2) urban and rural. Firstly, in order to separate indoor and outdoor sites, we apply places365 (Zhou et al., 2017), a convolutional neural network (CNN) CV model that is able to classify photos as being inside or outside, to calculate the percentage of photos that are outdoors in the collected 100 photos per site. Based on this measurement, we assign a site as an indoor or outdoor site. A first approximation to classify sites as either indoor or outdoor was done with a 50% threshold of ‘outdoorness’; however, as some sites do not allow photos to be taken in their interior spaces or of specific collections, visitors may not be able to fully share photos to show their indoor experiences. As well, visitors may share photos during their travel on arrival/departure or during the route that only includes exterior scenes. Thus, this analysis would not be representative. These factors will cause a bias towards outdoor scenes, which is problematic for the measurement of indoor and outdoor classification. Thus, we select several known outdoor sites (e.g., national parks) and calculate the 1% confidence interval of the distribution of their outdoorness. The lower bound of the confidence interval is used as the threshold for outdoor sites, i.e., sites with a percentage of outdoor photos below the threshold of 0.83 will be classified as indoor sites.

Secondly, to classify sites as urban or rural, we used the Code Point dataset (Ordnance Survey, 2022) as the proxy for population density. More specifically, we calculated the density of existing postcodes surrounding a site (within a radius of 10 km). Sites in high (low) density regions are considered as urban (rural) sites. Similar to the previous strategy, given known sites that can be classified as urban sites (e.g., sites in the central area of Greater London), we calculate the threshold (24 postcodes per km2) and obtain urban/rural classification for all sites.

As discussed in the introduction, the UK government considered indoor and outdoor spaces differently during the pandemic in terms of implemented preventive measures, and heritage sites in urban and rural are likely to be affected by COVID-19 with varying degrees. Thus, by classifying heritage sites into indoor and outdoor and urban and rural sites, we are able to, at much more finer level, investigate the impact of COVID-19 that may vary across different types of heritage sites.

COVID-19 topic detection

We used keywords associated with COVID-19 (’covid’, ’coronavirus’, ’social distance’, ’social distancing’, ’pandemic’, ’delta’, ’omicron’) to find topics related to COVID-19 among the collected user comments. A comment will be classified as being related to COVID-19 if one of the keywords appears in the comment. In addition to this, a pre-trained language model with a natural language inference strategy is also applied. Specifically, a pre-trained language model is a neural network that has been trained on large natural language datasets which enables the model to have a generalized off-the-shelf ability in understanding natural language in other circumstances. In this paper, as we are aiming to detect comments related to COVID-19, we used BERTweet-large (Nguyen et al., 2020), which was based on RoBERTa architecture (Liu et al., 2019) and pre-trained on a dataset of 850 million tweets in English, containing 845 million tweets streamed from January 2012 to August 2019 and 5 million tweets related to the COVID-19 from January 2020 to March 2020 (Nguyen et al., 2020).

The natural language inference strategy refers to the fact that the natural language model will be used to predict the relationship (e.g., entailment) of two given sentences. For example, if the following two sentences are given to the model: (1) ‘a museum with multiple tourists visiting’ (premise) and (2) ‘some people are visiting a place’ (hypothesis), the model will be trained to predict the relationship between the two sentences as ‘contradiction’, ‘neutral’ and ‘entailment’ (in this example the correct label is ‘entailment’). We fine-tuned BERTweet on the multi-genre natural language inference (MNLI) task of GLUE dataset (Williams et al., 2018) and prepared a template of ‘This sentence has user’s review about COVID-19’ as the hypothesis. We then estimate the probability of ‘entailment’ for each comment paired with the hypothesis as the probability of being related to the COVID-19 topic appearing in the given comment. To reduce misclassification, particularly false positives, instead of the commonly used threshold in the classification task of 0.5, we selected a threshold (0.92) that minimizes the appearance of COVID-19 topics before 2020. In detecting comments related to COVID-19, we are able to track the visitor’s perception towards the disruption of the experience caused by COVID-19 during their visits from the temporal dimension and therefore estimate the degree of recovery of heritage tourism in the UK.

Impact of COVID-19 on visitor involvement using number of online reviews as proxy

In leveraging social media data to account for the impact of COVID-19 on heritage sites, we use the number of online comments on Google Maps as the proxy for measuring visitor involvement. This involvement can be separated into two aspects. First, it represents the willingness of visitors to share their experiences. It is possible that during the pandemic, visitors may be more or less willing to leave comments online (i.e., a lower probability of leaving an online comment after visiting). We collected monthly actual numbers of visitors to the museums and galleries sponsored by the Department for Digital, Culture, Media & Sport (DCMS) of the UK government from January 2016 to March 2022 (UK Government, 2022c), and their corresponding number of online comments in the same period. Figure 1 shows how many visitors will be needed for, on average, one online comment to be given for these government-sponsored sites. We observe that, except for extremely unusual periods when these sites are closed due to COVID-19 in most of 2020 and early 2021 (where the lines are disconnected), there is no significant change in the ratio between the number of visitors and the number of comments after 2016. Therefore, we assume that the change in visitor willingness to leave an online comment does not change throughout the duration of this study for the included sites.

Fig. 1: Ratio between the number of visitors and number of comments (red line).
figure 1

The shaded red area shows the 99% confidence interval. Break points of the line correspond to the closure of the sites during the pandemic due to lockdown, and the spike before the break point in 2020 is likely due to a significant decrease in the number of visitors, rather than a significant increase in the number of comments.

Secondly, and more importantly, this finding indicates that the number of comments is also an effective indicator of the actual number of visitors. Figure 2 shows the linear relationship between the log number of actual visitors and the log number of online reviews on Google Maps with an R2 of 0.73. Thus, assuming that visitor’s willingness in leaving online comments does not change significantly compared to periods without heightened health risks, the number of online comments can reflect the actual passenger flow volume of the sites and therefore will be an important metric.

Fig. 2: Relationship between log number of visitors and log number of reviews.
figure 2

A regression line depicting the relationship between the log number of reviews on Google Maps (x-axis) and the log number of visitors (y-axis) for museums and galleries sponsored by the UK’s Department for Digital, Culture, Media & Sport (DCMS). The coefficient of determination (R2) for the regression line is 0.73.

We collected the number of inbound data from a dataset of UK monthly overseas travel and tourism (Office for National Statistics, 2022). This dataset includes the number of inbound visitors to the UK subdivided into their regions of origin (Europe, North America and other countries) and purposes (holiday, business, visiting friends or relatives and Miscellaneous). We merged the original classifications ‘North America’ and ‘Other countries’ into ‘Non-Europe’, as they are highly correlated.

In this paper, we will quantify the impact of COVID-19 on heritage sites using the trend of number of online comments and number of comments related to COVID-19 to estimate the degree of recovery of heritage tourism from both passenger flow level and perception level. We also measured the correlation between the reduction (compared to 2019) in the number of online comments in both urban and rural heritage sites and the reduction in the number of inbound visitors by different source countries and purposes since the outbreak of COVID-19. This will reveal whether there are significant differences in the impacts of the COVID-19 pandemic on heritage sites in urban or rural areas.

Sentiment analysis

Sentiment analysis, a technique to systematically extract and quantify emotional and subjective information from textual data using NLP methods, is used to measure visitor perception towards the impact of COVID-19 on their experiences. As discussed above, in this paper we focus on measuring the change in visitor experiences caused by the policies and restrictive measures taken to stop the spread of COVID-19. We do not focus on visitors feelings towards their own physical and mental health, as most visitors will not travel when they feel unwell and these experiences are unlikely to be expressed in review comments.

The rating score given by the visitors for each comment will be used as the expression of sentiment. However, the 5-scale rating scores are usually highly skewed in distribution (Hu et al., 2006). This also applies in this study, where the average score is 4.39 with more than 62% of the comments being 5-star rated. Thus, we fold the 5-star rating scores into a dichotomy classification, namely positive (rating score above the mean value) and negative (rating score below the mean value). We then use the binarized rating as the dependent variable.

We conduct sentiment analysis at two levels: document-level and word-level. At document level, in detected COVID-19 topics, we further classify them into four finer-grained subtopics regarding COVID-19 measures, namely face covering, social distancing, restrictions and closure of areas and hygiene equipment as introduced in the background section. Similarly to the strategy used in detecting topics related to COVID-19, we apply a combination of rule-based (keyword search) algorithms and likelihood-based models (language models). We use the keywords that are related to the four subtopics, which can be found in Table 2; a subtopic is classified as appearing in the comment if the keywords belonging to that subtopic are included in the comment. In addition, we draw on BFV (Liu et al., 2022), a weakly supervised text classification model, with the keywords above as the input to further model the likelihood of subtopics appearing in each comment. We use a fuzzy classification strategy to fuse the two results, i.e., when they both agree the subtopics appeared (not appeared), the corresponding label will be 1 (0), whereas when they disagree, the corresponding label will be given 0.5. Using the strategy above, we obtain the subtopic-document matrix and model its correlation with sentiment. The subtopic-document matrix is then used as the independent variable to build logit models for both indoor and outdoor sites to investigate how each subtopic is related to visitor sentiment.

Table 2 Keywords and BFV inputs used in detecting subtopics.

As for word-level sentiment analysis, we draw on a language model from NLP. Specifically, we train a simple sentiment language model using DistilBERT (Sanh et al., 2019) as a backend, in combination with a binary classification header with the existing comments related to COVID-19 and their corresponding dichotomy sentiment rating. Then, we use Integrated Gradient (IG) (Sundararajan et al., 2017) to calculate the gradient of sentiment with respect to each word, which represents the importance of the word in predicting the sentiment. We then aggregate the gradient of each word in every document to represent the overall sentiment of each word. Compared to traditional methods (e.g., bag of words or lexicon-based methods), the advantage of this approach is that the language model can predict the sentiment from each word as well as its surrounding words (context) in a comment since BERT is a context-aware language model. This dynamic information between sentiment and each word recorded in the parameters of the trained language model then can be extracted by IG. Therefore, it can more accurately capture the semantic information of each word.

The sentiment analysis enables this study to quantify visitor sentiment towards different preventive measures used in heritage sites and therefore provide useful evidence and feedback to inform their use in future heritage site management.

Results

Using the method introduced above, we detected 15,300 comments that are related to COVID-19 from the ~1.4 million reviews for 689 sites (since out of the 775 sites, 86 did not have any detected COVID-19-related comments, we excluded them from the subsequent analysis accordingly). Following the classifications of outdoorness and urbanness, we classify the 689 sites into indoor/outdoor and urban/rural sites as shown in the cross Table 3.

Table 3 Cross-classification of urbanness and outdoorness for 689 sites containing reviews related to COVID-19.

Figure 3 shows the trends of comments on Google Maps from January 2016 to April 2022 for indoor sites and outdoor sites. From the end of 2021 onward, the comments related to COVID-19 start to reduce for both indoor sites and outdoor sites, showing that visitors are gradually mentioning COVID-19 less frequently at heritage sites, despite the Omicron variant of the COVID-19 spreading rapidly across the UK. However, up until mid-2022, as reflected by the figure, it can be seen that the huge differences between the expected and actual number of visitors’ comments still persist, indicating that the recovery of heritage tourism is still significantly lagging behind pre-pandemic levels, specifically in terms of passenger flow. Particularly, the differences in expected and actual number of comments are larger in indoor sites compared to outdoor sites, consistent with the expectation that during the pandemic, outdoor sites were likely to be more popular than indoor sites (Landry et al., 2021).

Fig. 3: Trends of comments on Google Maps from January 2016 to April 2022 for indoor sites and outdoor sites.
figure 3

The blue bars show the number of comments for each month; the light blue bars show the expected number of comments calculated by ARIMA using the previous trends, and the red bars are the number of comments relating to COVID-19. The number above light blue bars is the ratio (displayed as a percentage) between the actual number of comments and the expected number of comments, while the numbers above red bars are the ratio (also as percentages) between comments related to COVID-19 and the actual number of comments.

We then calculate the correlation between the reduction in the number of monthly comments and the reduction in the number of inbound visitors (benchmarked to 2019) for urban and rural sites as shown in Fig. 4. Consistent with previous research that COVID-19 likely result in negative impacts on international holiday travel (Vaishar et al., 2022), there are significant positive correlations between the reduction in the number of comments and the reduction in the number of inbound visitors from Europe, which is the largest source of inbound visitors to the UK, or with the purpose of the holiday in urban sites, whereas these correlations are not significant for rural sites, showing that urban sites are more severely affected by the reduction in the inbound visitors, especially visitors from Europe or visitors with the purpose of the holiday, compared to rural sites.

Fig. 4: Correlation between the reduction in the number of monthly comments and the reduction in the number of inbound visitors (using data in 2019 as the benchmark) for urban and rural sites.
figure 4

Rectangles show the 90% confidence interval and whiskers show the 95% confidence interval. Box without color represents being insignificant under 0.05 significance level for the coefficient.

The next step is the analysis of the sentiment of the visitors towards COVID-19 at the document level and word level. We chose four subtopics connected with COVID-19 of special interest and significance in the dataset. Figure 5 is a Venn plot to show the number of the mentions of four subtopics and their intersections among comments related to COVID-19. Since the frequency of mentions of the four subtopics does not differ significantly across indoor/outdoor and urban/rural classifications, further details regarding the distribution of the four subtopics based on these classifications are not presented.

Fig. 5: Venn plot to show the number of mentioning four subtopics and their intersections among all comments that are related to COVID-19.
figure 5

Number in each eclipse represents the number of comments detected with the corresponding subtopic. Number outside the circles at the bottom of the figure shows the number of comments that are not mentioning any of the four subtopics.

Figure 6 shows the correlations between sentiment (positive emotion or negative emotion) and the four subtopics of COVID-19 for both indoor and outdoor sites. The figure shows an interesting differential in terms of the visitor attitude towards sanitization and social distancing. Specifically, in indoor sites, mentions of sanitization equipment (e.g., hand gel and sanitization stations) are significantly associated with positive emotion. However, this positive emotion is insignificant in outdoor sites. This shows that providing hygiene equipment is welcomed by visitors in indoor sites as it represents safety. Nevertheless, in outdoor places, visitors may have fewer safety concerns and thus are indifferent toward the provision of this equipment. As well, social distancing is significantly related to positive emotion in outdoor sites, but it is not significant in indoor sites. This could be explained by that, although social distancing measures (and one-way systems and related queuing) are welcomed by visitors as they reduce the risk of the virus spreading, indoor places where space is more constrained may have difficulties in implementing it effectively. Thus, complaints may arise and the sentiment becomes mixed. As well, restrictions broadly, and the closure of specific areas, and wearing face coverings are significantly associated with negative emotions in both indoor and outdoor sites, suggesting that they are consistently disliked by visitors.

Fig. 6: Correlations between sentiment and the four subtopics of COVID-19 for both indoor and outdoor sites for indoor and outdoor sites.
figure 6

Rectangles show a 90% confidence interval and whiskers show a 95% confidence interval. Box without color represents being insignificant under 0.05 significance level for the coefficient.

Figure 7 shows the sentiment analysis at the word level for all comments. Consistent with the document-level sentiment analysis, ‘closed’ and ‘restrictions’ are strongly negative, showing they are mostly complained about by visitors. On the other hand, surprisingly, the term ‘COVID’ is strongly associated with positive emotion. This may be contributed to that most comments mentioning COVID-19 express visitor excitement after returning to normal life after COVID-19 measures being released (e.g., reopening of public places and finish of lockdown), and thus ‘COVID’ will be considered as an indicator for positive emotion.

Fig. 7: Keywords and associated sentiment.
figure 7

The length of the bar represents the aggregated gradients of the sentiment with respect to the word.

However, this analysis at both levels of granularity does not differentiate between whether the negative emotion towards face covering and social distancing results from the discomfort caused by being forced to follow the rule or displeasure of other visitors’ disregard for restrictions: some people may appreciate the added safety and security that face covering requirements and social distancing can provide, especially during the ongoing COVID-19 pandemic, and thus do not approve of others when they do not follow the measures; others, however, may find wearing face coverings and social distancing to be inconvenient or unnecessary, and may feel that it interferes with their ability to enjoy their visits. These two attitudes both cause displeasure but show totally opposite views towards the anti-COVID-19 measures. Thus, following the approach utilized by Sanders et al. (2021), we implemented the same document summarization strategy but with a different model, Google’s Pegasus (Zhang et al., 2020), which has been fine-tuned on the reddit_tifu dataset that is more suited to the context of social media. By using 25 negative comments closest to the centroid of the face covering and social distancing topics, we carried out a pseudo-qualitative analysis:

Summary generated from the model for negative comments of wearing face coverings: “had to wear a mask to escape the crowds of people not wearing masks."

Summary generated from the model for negative comments of social distancing: “didn’t follow COVID-19 social distancing guidelines and had to walk around with no social distancing.”

From the two summarized sentences, we can observe that no strict management and enforcement of the measures and the failure of other visitors to follow the measures are the main cause for complaint, rather than discomfort at a subjective level. This indicates that visitors generally accept the measures and expect that visitor follows them to ensure their personal safety.

Discussion

Implications for management

This paper reveals several empirical findings that can inform the management of heritage sites during the recovery from COVID-19 and other pandemics in the future. Further, it can also inform the recovery plans being prepared at the national level to understand the efficient allocation of resources. Although visitor perception towards COVID-19 has been diminishing, the recovery of visitor involvement as reflected by the actual number of online comments (which is also associated with the actual passenger flow volume of the sites) is much slower. From the urban/rural perspective, this loss of visitor involvement is more obvious in sites in urban areas and is more strongly associated with international tourism. From the indoor/outdoor perspective, indoor sites have been more severely impacted compared to outdoor sites. Therefore, more supporting policies are needed to help the recovery of urban indoor cultural heritage sites that heavily rely on international travelers (e.g., museums and galleries in cities), even when the perceived impact of COVID-19 is not obvious among existing visitors. Also, these sites should have a financial plan in place to manage any financial losses that may occur due to unexpected closures or reduced international visitor numbers.

Through a pseudo-qualitative analysis, and fine-grained sentiment analysis at both document level and word level, this paper reveals visitor sentiment towards different measures taken in response to the COVID-19 pandemic. More specifically, this sentiment information and pseudo-qualitative analysis suggest that the measures are generally welcomed by visitors but they need to be implemented effectively for all visitors and staff, and visitors are also disappointed towards areas being inaccessible due to COVID. The provision of hygiene equipment is also welcomed, but they are only perceived positively when they are considered essential (in indoor scenes where frequent touching of surfaces might occur). Thus, maintaining order on site, such as ensuring that staff and visitors, especially front-of-house staff, follow the COVID-19 prevention measures when it is crowded, and regularly reviewing and updating the emergency plan to keep visiting areas as accessible as possible while ensuring safety, are crucial for improving visitor experience during pandemic measures.

This study provides an example of how to extract useful information from visitor feedback as an alternative to traditional visitor surveys. Compared to these methods, collecting data from social media reduces human capital costs in distributing questionnaires and is contact free. Leveraging social media data can provide larger sample sizes from different sites and therefore reduce sampling bias. Also, given that writing online reviews is similar to answering open-ended questionnaires (Pietsch et al., 2018) and that the data obtained from open-ended questionnaires exhibit the same level of richness and similar ranking of importance compared to that obtained from close-ended questionnaires (Krosnick, 2018; Reja et al., 2003), the information contained in social media data can provide a comprehensive representation of visitor opinions that is comparable to traditional surveying methods. With the advancement of machine learning, especially NLP techniques, we will be able to mine increasingly diverse and valuable information from user-published social media data to help refine sustainable heritage management strategies. Lastly, this method allows us to take surveys and make analyses retrospectively. For example, even if a survey was not conducted before an unexpected event (e.g., earthquake), we are able to convert user comments from social media before and after the unexpected event into two equivalently structured surveys and compare them to investigate whether there is a significant difference that reflects the effects of the event on the visitor experience.

Limitations

In this paper, we quantitatively analyzed the impact of COVID-19 on cultural heritage sites based on visitor involvement (number of online comments) and sentiment towards COVID-19 measures using state-of-the-art NLP models that convert unstructured visitors’ reviews into structured questionnaire-like results with corresponding sentiment scores. The current breakthroughs in the deep learning realm that brought significant improvements to neural network models in terms of human language understanding provide the basis for the method proposed in this paper, which has several advantages such as larger sample sizes, lower costs and allowing for retrospective analysis as discussed above. However, there are still some limitations inherent to this method:

  1. 1.

    Ambiguity in language is a concern. Natural language is complex and non-homogeneous among individuals, making it difficult to extract consistent and accurate meaning from unstructured text. Despite the limitation of ambiguity in language, it is important to note that this challenge is not unique to the method proposed in this paper. In fact, it is a prevalent issue in many other tasks, as long as they involve human language interpretation. For example, even in traditional surveys using questionnaires with a structured format, the language used in presenting the questions can lead to biases in answers (Fowler, 1995). However, this challenge could be especially prominent for studies involving social media data, where people tend to use informal, non-rigorous language, hindering the accuracy of expressing their views.

  2. 2.

    The diversity of topics is also an important consideration. In traditional surveys typically used in heritage, researchers can include questions that would provide a sufficient basis on which to address the questions at hand. However, with unstructured data like user reviews, some topics may be more prevalent than others. This can result in an uneven distribution of data by topic, making higher statistical uncertainties among topics mentioned less frequently by reviewers. Additionally, traditional surveying methods have well-established statistical tools for verifying, calibrating and adjusting survey results, the same may not necessarily be true for results converted by NLP methods from unstructured data like user reviews.

  3. 3.

    Lastly, the passive nature of user reviews means that researchers cannot ask specific questions that may be important for their analysis. This can be a limitation, particularly if the research is hypothesis-driven and requires specific data to test a hypothesis. However, on the other hand, data-driven analysis allows researchers to explore the data and identify patterns and trends that may not have been initially considered (Tansley et al., 2009).

Beyond the generic limitations associated with the methods used in this paper, there are also limitations that are specific to applying the method on measuring visitors’ perceptions towards COVID-19 measures at UK heritage sites. First, from the perspective of data, we collected online visitors’ opinions from the Google Maps platform, which may involve some limitations: For example, the data may contain more opinions from young visitors or people who are proficient in using social media platforms. In addition, as there is no strict curation with approval criteria on the uploaded social media review on Google Maps, the quality of the social media data may be low. Lastly, we only used English data in our analysis. This may cause a bias toward opinions from native English speakers and ignore some international non-native English speakers who account for a significant part of the consumers of heritage tourism in the UK. Thus, the results given by the models should be more critically evaluated with caution.

Conclusion

In this study, we collected user review data for 775 sites on Google Maps and analyzed it using state-of-the-art machine learning methods to detect and quantify the impact of COVID-19 on heritage tourism in the UK, aiming to (1) help the recovery of heritage tourism in the UK during the “post-COVID” era and summarizing lessons for the heritage tourism in preparation for the next potential severe public health emergency, and (2) showcase the efficacy of advanced machine learning techniques in interpreting unstructured data for potential relevant future research.

From the managerial implication perspective, this research provides critical insights for heritage site management. Notably, it reveals that although visitor perception towards COVID-19 (represented by reviews related to COVID-19) has significantly decreased, the difference between the actual and expected number of comments is not yet (until April 2022) restored to those based on pre-pandemic trajectories, suggesting visitor involvement with heritage sites needs more time to recover. Particularly, it underscores the need for enhanced support policies and financial planning for urban, indoor sites that heavily rely on international tourism, as these sites have seen a slower recovery of visitor involvement. Furthermore, the effective and consistent enforcement of COVID-19 preventive measures by both staff and visitors is crucial in maintaining a positive visitor experience during a pandemic. Areas should be kept as accessible as possible while ensuring safety to reduce visitor disappointment. Additionally, the value of mining social media data for visitor feedback is highlighted. This provides a cost-effective and contact-free alternative to traditional visitor surveys for heritage site managers. These practical implications can ensure a more robust and resilient response to crises, safeguarding the sustainability of the heritage tourism industry.

From a methodological development standpoint, the advanced machine learning techniques used in this study to extract information from online reviews have demonstrated their effectiveness in determining visitor perceptions towards specific facets of heritage sites. These methods can be harnessed to capture shifts in perception chronologically, particularly before and after unexpected or disruptive events such as conflicts, economic downturns, natural disasters, and disease outbreaks. These events often lead to profound impacts on the operations of heritage sites, which may or may not result in significant changes in the visiting experience. Therefore, the use of these advanced machine learning methods, as showcased in this study, highlights their potential for being adopted in future research. By leveraging these techniques, researchers can gain a deeper understanding of visitor sentiment and experiences, providing critical insights that can further enhance the management and preservation of heritage sites.