Abstract
Holistic interventions to overcome COVID-19 vaccine hesitancy require a system-level understanding of the interconnected causes and mechanisms that give rise to it. However, conventional correlative analyses do not easily provide such nuanced insights. We used an unsupervised, hypothesis-free causal discovery algorithm to learn the interconnected causal pathways to vaccine intention as a causal Bayesian network (BN), using data from a COVID-19 vaccine hesitancy survey in the US in early 2021. We identified social responsibility, vaccine safety and anticipated regret as prime candidates for interventions and revealed a complex network of variables that mediate their influences. Social responsibility’s causal effect greatly exceeded that of other variables. The BN revealed that the causal impact of political affiliations was weak compared with more direct causal factors. This approach provides clearer targets for intervention than regression, suggesting it can be an effective way to explore multiple causal pathways of complex behavioural problems to inform interventions.
Similar content being viewed by others
Introduction
Two years after COVID-19 was declared a global pandemic, it remains the greatest public health crisis facing the United States. By December 2021, the country had suffered 48 million cases and 770,000 deaths1. The development and availability of vaccines has been a crucial step in preventing the worst clinical effects of the coronavirus. However, the pace of vaccination has stagnated, falling from a peak of 3 million new doses per day in April 2021 to 1.5 million daily in December 20212. Despite efforts to boost vaccination rates, a large proportion (29.6%) of the eligible US population remain unvaccinated1—many by choice—and thus vulnerable to severe illness and death should they become infected.
COVID-19 vaccine hesitancy is a complex behavioural problem3. There is no shortage of hypotheses on why: vaccine safety and side-effect concerns, politics, disinformation, race, perceptions, emotions, social norms, individual influences, knowledge, and economic factors have all been proposed4,5,6,7,8,9,10,11,12,13,14. Since randomised control trials (RCT) are challenging or impossible, many quantitative studies have sought to identify the determinants of vaccine hesitancy from observational data using descriptive statistics and linear regression models7,8,9,10,11. These studies suggest that perceived COVID-19 risk, conservative-leaning political views, prior vaccine usage and attitudes, and vaccine safety concerns are strong predictors of vaccine hesitancy7,8,9,10,11. A substantial body of qualitative research has focused on using qualitative data from in-depth interviews, social media conversations and content, or open text response in surveys to understand and capture the points of view of general populations15,16,17,18 and subpopulations in the US19,20,21 (e.g., Black and Latino Americans) who remained unvaccinated without predetermining those points of view through prior selection of survey topics. These studies suggest that concerns about potential vaccine side-effects, mistrust of the healthcare system and pharmaceutical companies, financial issues, and myths and misconceptions about COVID-19 affect the intent to get vaccinated. This current study was built on these deep explorations of vaccine hesitancy context as an attempt to abstract and structuralize the hierarchy and interactive nature of these factors in the US.
These studies, however, have some limitations. First, many quantitative studies focus more on sociodemographic differences than on underlying beliefs and barriers4,5,6. Second, correlation does not imply causation22, 23. For instance, with observational data, the potential to misattribute correlation to causation due to confounding (i.e., when a spurious association between two variables is found due to a third variable having an influence on both of them) is often acknowledged but is challenging to address22, 24, 25. Although statistical corrections for these effects are possible, the experts with the required knowledge may suffer from subjective bias or be prejudiced by hypotheses25,26,27. Third, many studies focus more on identifying predictors of the outcome of interest7, 9, 14, 28 and less on understanding how these predictors interact at a systems level to give rise to the outcome (causal inference). There are likely multiple intersecting paths and intermediary steps by which causal factors can influence behaviours. Understanding this could be critical to inform more nuanced and precise interventions. However, statistical causal inference techniques (e.g., propensity score matching, regression discontinuity design) are not designed to test multiple, connected causal hypotheses simultaneously24 (e.g., A causes B, B causes C, but A also directly causes C etc.). As a result, it is difficult to explore multiple causal pathways of behavioural outcomes with these approaches. An easier conceptualisation of the multi-causal complexities of human behaviours is a structural causal model (SCM) such as a structural equation model (SEM)8, 28. However, SEM requires specifying what variables interact and how they do so, which means that the causal structure is completely specified a priori, not by data.
Recent advances in machine learning have spawned a class of algorithms called causal discovery to autonomously construct a type of SCM called causal Bayesian Network (BN) which can be combined with human expert insights or findings from qualitative research as needed. Graphically, a BN is depicted by a Directed Acyclic Graph (DAG)29, 30 (see Fig. 1a for an example). Based on the statistical conditional dependencies amongst variables, BN lends itself to causal inference to estimate how much any variable would change if the state or value of another variable is changed as a result of an intervention (this is known as “what-if analyses” or “interventional queries”) (Online Appendix A)31. To distinguish this from typical correlation analysis or predictive models (i.e., observational queries), imagine trying to attribute an increase in vaccination rate (say, “D” in Fig. 1) to 100 extra vaccination sites (“C”). In a predictive model, the observed number of vaccination sites could be the result of a broader intervention such as increased funding (“F”) that also affects other potential variables such as community outreach (“H”). Therefore, any increase in vaccination rate cannot be attributed solely to the number of vaccination sites.
An example of a simple directed acyclic graph (DAG) and an interventional query. (a) Variables are depicted as a set of nodes, and the probabilistic conditional dependency among variables is depicted by a set of directed arrows. Here, C is dependent on A, B and F, but not on E, D, G or H. B and F are direct causes of C, while A is an indirect cause of C via B. E is not causally related to any other variables. F can impact D via C or H. F is a common cause to both C and D; thus, C’s effect on D is confounded by F. B can be seen as an instrumental variable for the effect of C on D32. (b) Depiction of an interventional query on “C.” This is the equivalent of removing any connections upstream of C, and then estimating the probabilities of different outcomes of another variable due to a change in C.
By contrast, in an interventional query we forcibly replace the value of one variable to see what change it would bring to the outcome of interest. In this case, the specific number of extra vaccination sites no longer depends on any other variables like funding. Graphically, this is equivalent to removing any connections upstream of “C” (i.e., graph surgery) and then estimating the likely outcome of the outcome variable of interest due to a change in “C” (Fig. 1b). Formally this is known as do-calculus first proposed by Judea Pearl31. Thus, conceptually, interventional queries are like comparing treatment groups with control groups in a virtual RCT, whereas observational queries are more akin to using correlations to predict expected observation (without accounting for all confounding biases). This makes interventional queries far more informative for the purpose of identifying targets of intervention design. Notably, unlike an experiment, where a new RCT is needed to test each new causal hypothesis, the same model can be used to explore “causes of cause” if the immediate cause is not directly actionable, or if one would like to investigate upstream or alternative causal paths.
We conducted the Surgo COVID-19 Vaccine Survey (Online Appendix B) in early 2021, when vaccines were first made widely available in the US, collecting a wide range of psycho-behavioural data on variables that may drive COVID-19 vaccine intention. With this data and BN, we aimed to gain systems-level insights into the complex causal pathways influencing vaccine intent, without a priori model specification. Here we aim to (1) identify the complexities and the mechanisms leading to COVID-19 vaccine intention (i.e., “When a vaccine for COVID-19 is available to you, how likely are you to take it?”), and (2) contrast this with the conclusions that would be reached via logistic regression from the same data. Finally, we suggest several strategies to increase vaccine intent.
Results
The causal pathways are complex for vaccine intention, with beliefs in social responsibility, vaccine safety and anticipated regret forming the most direct causes
The resulting BN of 45 categorical variables from the Surgo COVID-19 Vaccine Survey reveals a rich network of inter-dependencies between the causal factors of vaccine intention: demographics, structural factors, social influences, beliefs and perceptions, emotions, influencers and behaviours (Fig. 2 and Table 1). For convenience, we call any variables that are upstream of vaccine intention in a directed path causal factors to vaccine intention. Those that have statistically significant estimated causal effects (p-value < 5%) on vaccine intention are called significant causal factors of vaccine intention. For clarity, we omit the non-causal factors (Online Appendix C, Table C3) in Fig. 2 but the full DAG can be found in Online Appendix D.
The BN structure learnt (developed) from the Surgo COVID-19 Vaccine Survey data, depicted as directed acyclic graph (DAG). For clarity, we only include nodes that are direct (immediately upstream) or indirect causes (further upstream) of vaccine intention. Significant causes are those whose change would significantly alter vaccine intention (p-value < 5%) when we perform simulations. The direct causes of vaccine intention are mostly beliefs and perceptions, and social norms. Direct causes are in turn driven by remote causes, which consist mostly of structural factors, influencers and outcome expectations.
We found 30 variables that are either direct (immediately upstream) or indirect (further upstream) causes of vaccine intention. Remote causes (i.e., several nodes upstream of vaccine intention) tend to be influencers, structural factors and outcome expectations. These variables in turn drive the more direct causes of vaccine intention, which are dominated by beliefs and perceptions and by social influences (which include social norms and societal expectations). There are four direct causes of vaccine intention with significant causal effects: the belief that vaccines will be unsafe; the belief in having social responsibility to get vaccinated for COVID-19 to protect others (social responsibility); feeling regret if one did not take the COVID-19 vaccine and then subsequently contracted COVID-19 (anticipated regret); and willingness to take the COVID-19 vaccine in the first three months of availability (early adopter). Early adopter is a very close proxy of vaccine intention. Variables found not to be causal of vaccine intention include getting information about COVID-19 from the left-wing media or from Fox News; delayed in seeking medical care in the past year due to cost; income; and the belief that COVID-19 vaccine testing is rushed.
Unlike regression analysis, we could identify from the DAG several sequences of cause-and-effect mechanisms (a.k.a. causal pathways) through which a given intervention ultimately affects vaccine intention. This suggests that there are multiple means for practitioners to influence vaccine intention directly or indirectly. Social responsibility and anticipated regret are particularly important; in addition to having direct effects on vaccine intention, these causes also mediate the effects of many demographic factors (e.g., age, urbanicity and political affiliation), as well as the effects of perceived risk of COVID-19 and influencers, on vaccine intention. In particular, social responsibility centrally affects several important factors downstream—willingness to be an early adopter, anticipated regret and the belief that vaccines will be unsafe. Thus, social responsibility should be considered a primary target for interventions.
Further upstream, we found that believing that COVID-19 is dangerous to one’s community is a remote but significant causal factor of vaccine intention that drives many factors downstream: expected chance of catching COVID-19, the level of worry about catching COVID-19, the proportion of the community that I think will take the vaccine, and importantly, anticipated regret and social responsibility mentioned above.
Convincing people to have the social responsibility to get vaccinated has the greatest effect on vaccine intention
To estimate the effect size caused by a given intervention, we compute the interventional odds ratio (OR), which estimates the increase in odds of an outcome given a hypothetical and active change in an upstream variable from a reference level31, 34. Note that in performing the interventional query, we are not simulating an empirical intervention, but a hypothetical change of the value of the specific upstream variable (Online Appendix A).
Figure 3 shows the interventional ORs and 95% confidence intervals (with numerical details in Online Appendix D, Tables D1–D7). Convincing people of the social responsibility to get vaccinated (from low to high social responsibility) has by far the largest effect on moving people from low to high levels of vaccine intention (OR = 49.33, 95% CI 38.64–62.99), followed by showing people that they may regret not taking the vaccine (OR = 3.24, CI 2.75–3.81). The effect sizes of these drivers suggest that messages emphasising social responsibility and anticipated regret could be leveraged even more than is currently happening, to change the minds of those with low vaccine intent.
The interventional odds ratio of causal factors that have statistically significant estimated causal effects (at the 5% level) on vaccine intention. The bold text in the y-axis labels indicates the reference level of a given variable, and the error bars are the 95% confidence interval of the odds ratio estimate. The results suggest that making people believe that it is their social responsibility to get vaccinated to protect others, triggering anticipated regret for not taking the vaccine, making people believe that the majority of their community are taking the vaccine, and discouraging the belief that vaccines will be unsafe have the largest effect in increasing vaccine intention from low to high.
Our analysis also suggests that encouraging people to believe that COVID-19 vaccines are safe (OR = 2.94, CI 2.56–3.45), promoting trust in the vaccine development process (OR = 2.24, CI 1.93–2.60), persuading people that COVID-19 poses a danger to members of their community (OR = 1.42, CI 1.216–1.66) and that the majority of their community will take the vaccine (OR = 2.44, CI 2.08–2.85) will motivate people with low vaccination intention. Similar causal factors were identified, but with a smaller interventional OR, for driving people with moderate vaccination intention to high vaccine intention (Fig. D2, and Tables D8–D14 in Online Appendix D).
The causal factors of social responsibility, the belief that vaccines will be unsafe, and anticipated regret
Variables such as social responsibility, the belief that vaccines will be unsafe and anticipated regret may be seen as vague targets for interventions. We therefore leveraged a particular feature of BNs—that causes of one variable can be outcomes of another—and examined the “causes of cause” of vaccine intention without having to rebuild a new model (“Causes of cause” section, Tables E1–E16 in Online Appendix E).
We found that social responsibility and anticipated regret are driven by a set of factors related to the risk perception of COVID-19 to self and others (e.g., expected chance of getting COVID-19 without vaccines, worry about getting COVID, belief that COVID is dangerous to one's community). Additionally, anticipated regret is driven by a set of drivers related to social considerations and vaccine safety concerns. The individually small effects of these variables suggest that social responsibility and anticipated regret will require a more concerted effort to nudge (i.e., a simultaneous intervention on their multiple causes) as a means to increase vaccine intention.
The belief that vaccines will be unsafe is driven by trust in the vaccine development process, proportion of my reference community that I think will get vaccinated and anticipated regret. This suggests that practitioners can appeal to a mixture of facts (e.g., the clinical trials that vaccines undergo in their development) and emotions (e.g., anticipated regret) in addressing vaccine safety concerns.
Using the same model to study the impact of political affiliation
There is a popular belief, in part propagated by the media, that Republicans are to be blamed for the low COVID-19 vaccine uptake in the US4. It is not without empirical support. The associations between vaccine hesitancy and political affiliation shown by Khubchandani et al., Viswanath et al.35, 36 and others10, 37, along with our own correlative analysis (Online Appendix F) have suggested that Republicans on average have higher vaccine hesitancy than Democrats. However, our causal analysis paints a more nuanced picture of the influence of political affiliation (Online Appendix G). Although social responsibility is an important and direct cause of vaccine intention, it is only weakly influenced by political affiliation (OR = 1.18, 95% CI 1.06–1.32). Instead, the effect of political affiliation on vaccine intention is mostly mediated by worries about catching COVID-19 and belief that COVID-19 is dangerous to my community—both remote and weak causes of vaccine intention. For this reason, the estimated causal relationship between political affiliation and vaccine intention is small, and the strong observed association between these variables is likely due to confounders. Upon a closer look, race and getting COVID-19 information from scientists—a proxy for general trust in science—are significant causal factors of political affiliation (Online Appendix E, Table E16). Although individually neither of the factors are significantly causal to both political affiliation and vaccine intention, they could still confound the relationship between political affiliation and vaccine intention if they were to change simultaneously. Overall, the confounders would lead to an overestimation of the effect of political affiliation in non-causal analyses.
Comparing the findings from BN and a multinomial logistic regression model
With the understanding that correlations alone do not imply causation, we contrast the findings from BN and a multinomial logistic regression model to highlight the recommendations that we might have reached if the correlative approach was used on the same dataset (Online Appendix C, Tables C1–C3). From the multinomial logistic regression model, 22 correlates are significantly associated with vaccine intention. By contrast, only 11 causal drivers from the BN have significant causal effects. This suggests that some of the significant associations estimated by regression would not be effective targets for interventions, as they are not supported by the causal analysis. There are several discrepancies between the results from the BN model and the regression model. First, while the regression model identifies beliefs in conspiracy theories, long-term side-effects from vaccines, and various sources of COVID-19 information to be determinants of vaccine intention, the BN identifies these variables either as non-causes or as remote and insignificant causal factors. Second, the belief in various information sources was found to be significantly associated with vaccine intention by the regression model but were not significant causal factors by the BN. Third, the regression model and BN differ in the type of vaccine safety concern that they identify as influential for vaccine intention. The regression model identifies the belief that COVID-19 vaccine is tested for the safety of my race and the belief that COVID-19 testing is too rushed as important concerns, while the BN suggests that trust in the vaccine development process is an important concern. The differences might be explained by several methodological differences between regression models and BN, as we describe in the Discussion.
Discussion
Using a rich observational dataset from a nationally representative survey, we identified causal factors of COVID-19 vaccine intent in the United States and their mechanisms using machine learning and BN. We showed how insights generated from this approach can be much more nuanced and informative for intervention design than logistic regression alone.
The discrepancies between the BN and logistic regression models can be explained by the differences in complexity and modelling method (unsupervised structural learning in BN versus modeller-specified formulae). The BN models causal factors and vaccine intention as a complex system of causal interdependencies; thus, it can estimate the indirect or direct causal effects of these factors upon vaccine intention. In contrast, regression models often ignore the interdependencies among covariates and (falsely) treat each behavioural predictor as being directly and linearly associated with vaccine intention. However, numerous studies have found that interactions are possible among behavioural and demographic variables38, 39. There is also no evidence supporting the linearity of the associations between these predictors and vaccine intention. Since additivity and linearity—two important assumptions of linear regression—are likely violated, in addition to the fact that regression models only estimate associations rather than causation, it is unsurprising to see disparities between the findings of these two methods.
Our study suggests that practitioners could focus on designing interventions that promote social responsibility and anticipated regret, and dispel vaccine safety concerns. These factors have a much larger causal impact on vaccine intent than demographic attributes. Ideally, individual interventions should not be done in isolation, as shown by the interdependencies of various causal factors captured by the BN model.
While there is evidence of success from governments that have stressed the importance of getting vaccinated as a social responsibility and civic duty40, 41, and several studies have shown the positive association between social responsibility and vaccine intent and uptake42,43,44, the 49-fold magnitude of the impact of social responsibility relative to other strategies such as addressing vaccine safety concerns45 and leveraging descriptive social norms46, 47 is notable. Several factors in turn have significant effects on social responsibility: the expected chance of getting COVID-19 without vaccines, the level of worry about catching COVID-19, and the belief that COVID-19 is dangerous to one’s community. Since the individual effect of each of these drivers on social responsibility is small, a cocktail of interventions that targets them simultaneously may be necessary to promote it.
Less was known about the extent to which anticipated regret affects COVID-19 vaccination decisions. Merely asking whether someone would anticipate regret for not engaging in a health behaviour (e.g., skipping an annual physical exam) has previously been shown to be sufficient in motivating that behaviour48. A meta-analysis49 of 81 studies have shown that anticipated regret is positively associated with various health behaviours that ranges from cancer screening and physical activity to vaccination, Our results show that the causal effect of triggering anticipated regret is greater than that of addressing vaccine safety concerns. Notably, anticipated regret is driven by perceived risk of COVID-19 to self and others, social responsibility, norms and expectations, and trust in vaccine development. Collectively, these results suggest that to convince people that they will feel regret if they do not take the COVID-19 vaccine, they must first be convinced that the negative health or social consequences could be realised—but this must be done in a persuasive way that considers people’s needs for autonomy and self-determination, to avoid provoking resistance50.
Our findings support existing interventions that focus on addressing vaccine safety concerns45, such as communicating results of COVID-19 vaccine safety surveillance51 and low incidence of serious health problems, and disseminating accurate vaccine information through various channels40, 47, 52, 53 (posts on social media platforms, recommendations from trusted clinicians, announcements from public health organizations). Moreover, increasing public trust in the vaccine’s development and helping people see that their community is increasingly taking the vaccine (if true) could strengthen positive vaccine safety beliefs54. Interestingly, social responsibility and anticipated regret also increase vaccine intention indirectly by reducing vaccine safety concerns. One explanation is that people who accept social responsibility and/or anticipate regret might be motivated to justify their vaccination decisions by convincing themselves that vaccines will be safe. By indirectly encouraging self-persuasion55, we could potentially change the vaccination behaviour of those resistant to direct vaccine safety messages, especially if they perceive threats to their autonomy and freedom of choice56. In sum, interventions on social responsibility and anticipated regret can serve as complements to the more common approaches of showcasing scientific evidence and social norms in order to further reduce concerns about vaccine safety45.
Our study has several limitations. First, respondents’ vaccine intent was self-reported. When reporting how likely they were to take the vaccine, respondents might have overlooked barriers such as cost and transport. This could explain why most structural barriers (e.g., delayed medical care due to work schedule) and the belief that vaccines are free were not found to be significantly causal. Second, vaccine intention, the outcome variable of this study, is not a perfect proxy for vaccine uptake. Several studies57,58,59,60 revealed that vaccine hesitancy is prevalent even amongst those who were vaccinated for COVID-19. At the same time, according to CUBES61, our behavioural framework for the survey design of this study, a high vaccine intention does not necessarily lead to vaccine uptake—there could be contextual barriers such as lack of health insurance and sick paid leave62, 63 that could prevent individuals with high vaccine intent to get vaccinated. Third, our data was from early 2021; since then, news of new viral variants, booster recommendations64, and breakthrough infections64 could affect people’s responses. Fourth, although BN is useful in modelling confounding if the confounding variables are included in the training data set, it is still subject to potential latent confounders, i.e., confounding variables not captured in the data. Fifth, error could be introduced to the causal structure of the model in the periphery due to the fact that the same statistical properties of the data could be represented by multiple, similar DAGs in an equivalent class known as Completed Partially Directed Acyclic Graph31. Lastly, regression-informed feature selection was done to reduce data complexity; however, the choice of regression arbitrary, and more research should be conducted to test this approach.
Despite these limitations, our study supports several existing interventions, shows how they mediate one another, and provides evidence that several key drivers and causal pathways of vaccine intention may have been underemphasised. Our results also demonstrate that BN could be an effective way to explore multiple causal pathways of other complex behavioural problems. Last, a possible extension of the present study is use a mixed method approach by first inferring precise—and potentially differing—determinants of vaccine intention of subsegments of populations using BN, followed by collecting qualitative data from each subsegment to come up with effective and targeted intervention strategies, to further optimise vaccine uptake.
Methods
The Surgo COVID-19 vaccine survey
To collect data on a broad number of psychobehavioural factors behind COVID-19 vaccine intention beyond a narrow scope of demographic factors, we surveyed a nationally representative sample of 2747 US residents via the National Opinion Research Center (NORC) AmeriSpeak Omnibus Survey Panel from December 21, 2020 to January 4, 2021. We measured people’s vaccine intention by asking them “When a vaccine for COVID-19 is available to you, how likely are you to take it?”. The survey included questions on respondents’ beliefs, risk perceptions, emotions and perceived social norms, along with their demographics (Online Appendix B). Summary statistics on the survey respondents are provided in Online Appendix H.
Surgo COVID-19 vaccine survey weighting
Panel-based sampling weights, which were computed from the inverse probability of the selection from the NORC national frame, were used to create nationally representative sampling weights for the survey data. The panel weights were also raked to external population benchmarks. The weighting was based on 7 variables: age, gender, census division, race/ethnicity, education, housing tenure, and household phone ownership status.
Ethics statement
This questionnaire and survey study were reviewed and approved by the Salus Institutional Review Board (protocol number 02) on 30 November 2020 with an original expiration date of 29 November 2021. A renewal request was accepted on 24 November 2021 to extend the expiration date to 24 November 2022. Per NORC procedures, participation is voluntary at the time that respondents are asked to join the panel and at the time they are asked to participate in the Surgo COVID-19 Vaccine Survey. Prior to the start of the survey respondents were given information about its purpose, and they must acknowledge that they are over 18 and give their informed consent before taking the survey. No personally identifiable data were transmitted, used, or stored for this analysis in adherence to the principles of the Declaration of Helsinki. The methods in this study were performed in accordance with all relevant guidelines and regulations.
Autonomous machine learning of the causal Bayesian network
BN represents the probabilistic conditional dependency among variables as a set of directed edges (i.e., arrows) in a DAG (Fig. 1). The representation is compact in that an outcome node may be considered as a causal node of another variable to which its arrows emanate. By tracing the arrows backwards from a given node, one may read off its upstream causes. It is with this graphical form that BN allows us to reason the system-level view of probabilistic interplay among different causal drivers in domains from disease diagnosis to biomonitoring29, 65, 66.
How do we identify the structure of a DAG in the first place? A more generalised approach is to learn the DAG structure of a BN automatically using a class of algorithms called causal discovery algorithms. These algorithms learn the conditional dependencies between variables from data directly, in a hypothesis-free manner67. While the mathematical foundations of these approaches are beyond the current scope, there are reviews that survey some common algorithms67,68,69. An important feature is that once the structure is learnt, confounders that are present in the input dataset can be easily identified as the common cause nodes (Fig. 1). Moreover, partial expert knowledge can be integrated as a prior to the structural learning. Until recently, applications of such algorithms were only feasible to problems of few variables, due to the computational complexity involved. This has changed with new research on more efficient machine learning methods, and ever-advancing computational equipment70. Evaluating BN network structure is described in Online Appendix I.
Algorithms for searching SCMs can be score-, constraint-, or hybrid-based. We followed previously developed proprietary procedure33, which is a hybrid-based implementation using the constraint-based PC algorithm as an initialization step, and the Quotient Normalized Maximum Likelihood as the score for the subsequent score-based Markov Chain Monte Carlo optimization. The computation takes about a day on an Amazon Web Service’s z1d.6xlarge EC2 instance. In structural accuracy tests using synthetic data sets, we have found this hybrid algorithms to outperform common algorithms such as PC alone that could otherwise produce results orders of magnitude faster. There may be alternative circumstances where computational speed outweighs the benefit of accuracy gains.
Although BN is useful in modelling confounding if the confounding variables are included in the training data set, it is still subject to potential latent confounders, i.e., confounding variables not captured in the data. There are algorithms that attempt to include information about potential latent confounders between pairs of nodes, most notably the Fast Causal Inference (FCI) algorithm and its variants71, 72. Anecdotally we have found with a complex graph and a limited dataset, FCI tends to identify latent confounders occurring everywhere in the graph. While it certainly can be true, it is not particularly helpful for practitioners.
Bayesian network: representation and causal discovery
To infer the drivers of vaccine intention, a structural learning algorithm was used to build a BN model with 2,477 completed survey responses (out of the 2,747 total survey responses). Due to the complexity of the structural learning task, a limited number of variables (and limited number of levels for each variable) can be included in the model so that it can be learnt within a reasonable amount of time and accuracy. In addition, given the sample size of our vaccine survey datasets, there is a trade-off between the number of included variables and model performance. In particular, the unsupervised learning of where the edges should be and their directions is an NP-hard problem73, since the number of possible topologies grows super-exponentially with the variable included. Given just 5 variables to consider, there are 29,281 possible DAGs. Given just 10 variables, there are more than 4 quintillion possibilities. Aside from a more efficient search algorithm such as Order Markov Chain Monte Carlo, an effective way to reduce this complexity is careful selection of input variables. Using a previously developed procedure33, we determined that a BN with 45 variables and at most 3 discrete or ordinal levels achieves the optimal balance of model complexity and performance. For this reason, using the feature selection process (described in the section below), we selected 45 variables as inputs for the causal discovery algorithm. In addition, we converted all continuous variables into categorical/ordinal variables with at most 3 levels. For example, vaccine intention (originally on a scale of 0–10) is discretised into 3 levels: Low (0–3), Moderate (4–6) and High (7–10). The expected performance of our model was proxied by several graph metrics33 (Online Appendix I), including the V-structure74 precision, recall, and f1-score, which we found to be 0.92, 0.79, and 0.82, respectively. Note there are other approaches, including different structural learning algorithms75 to circumvent the complexity problem but pre-processing data is arguably the most straightforward for practitioners. Lastly, we imposed additional constraints as prior to the structure search algorithm such that obviously unreasonable relationships are precluded (e.g., worries about catching covid causes race).
Linear regression as a feature selection process to prioritise variables to include in the BN
We began our investigation with a weighted least squares (WLS) regression model to establish correlational relationships between 68 behavioural factors and self-reported vaccine intention. The primary purpose of this model is to aid variable selection for the BN, which is the main model of this study.
We trained the WLS model with data from the Surgo COVID-19 Vaccine Survey (n = 2454 completed cases). Note that the number of complete cases that were used to train the WLS model is slightly smaller than the 2477 completed cases that were used as inputs to the causal discovery algorithm. This is because only a subset of the variables that were used to train the WLS model was used as inputs for causal discovery. The variables in the training data for the WLS model are in their original scales (e.g., respondents were asked in the survey to rate how likely they are to get the COVID-19 vaccine on a scale of 0–10; thus, vaccine intention has a scale of 0–10). The model is of the form:
Here Yi is the self-reported vaccine intention of individual i (with a scale of 0–10, with 0 indicating that the respondent has extremely low intention of taking the COVID-19 vaccine, and 10 indicating extremely high intention), and \({X}_{i}\) is a vector of causal factors that might affect the individual’s vaccine intention.
The results from the regression analysis, summarised in Tables J1–J5 in Online Appendix J, usefully inform variable selection for the BN. Variables were selected for the BN based on three criteria: (1) the strength of association of the variable with vaccine intention from our regression model, (2) the amount of corroborating evidence from existing literature for the relationship between the variable and vaccine intention (or other similar outcomes), and (3) the extent to which the variable can be easily manipulated by interventions. The full list of 45 variables that were included for the BN is in Table 1 below. To ground the BN in the behavioural context, we used a behavioural framework that we had previously developed (CUBES)61 to assign each of the variables in the BN a corresponding causal factor category for easier interpretation (Table 1). The primary reason for this step is to reduce data complexity and the choice of linear regression is subjective for its straightforwardness; we could have chosen other procedures such as mutual information maximization to augment this process.
Some variables that were excluded from the BN (according to the criteria above) are knowledge about the COVID-19 disease, flu vaccine uptake status, feeling depressed in the past three days, and the belief that natural immunity is more effective than vaccine-induced immunity.
Bayesian network: interventional queries
For Bayesian network, an interventional query, or what-if analysis is defined as estimating the change of outcome values as the result of changing the value of another variable (the “intervened” variable, known as the evidence variable) while holding the values of all other variables constant. This is sometimes referred to as do-calculus. We estimated the pre-intervention and post-intervention probability distribution functions for the outcome variable, and then computed the interventional odds ratio (OR). The interventional OR estimates how much more likely an outcome is given a change in an evidence variable. For more details on the definition of interventional OR, please refer to Online Appendix A.
Comparison between multinomial logistic regression and BN
We compare the findings from the BN and a multinomial logistic regression model to highlight the differences in the conclusions that we would have drawn had we relied on the correlational relationships from the regression model instead of the estimated causal relationships from the BN.
To facilitate this comparison, we built a second multinomial logistic regression model using the same variables as for the BN—i.e., 44 causal factors that are discretised to at most 3 levels as independent variables, and vaccine intention with 3 levels (Low, Moderate, and High) as the dependent variable. To simplify our comparison, a causal factor is deemed to have significant association with vaccine intention by the multinomial logistic regression model if any of its associated multinomial logit values is significant at the 5% level. In other words, if a “significant” causal factor has two levels (e.g., Low income, High income), then either the multinomial logit estimate comparing respondents with Low income to High income for High vaccine intention relative to Low vaccine intention is significant, or the estimate comparing respondents with Low income to High income for High vaccine intention relative to Moderate vaccine intention is significant. Similarly, in the BN, a causal factor is deemed to have a significant estimated causal effect on vaccine intention if any of its intervention OR is significant. Using these definitions of “significance”, we determined and compared the significant drivers from the regression model and the BN.
Data availability
The Surgo COVID-19 Vaccine Survey data that have been used for the present study are not publicly available but are available upon request. Please contact Dr. Sema K. Sgaier at semasgaier@surgohealth.com.
References
CDC. COVID Data Tracker. Centers for Disease Control and Prevention https://covid.cdc.gov/covid-data-tracker (2020).
Kaiser Family Foundation. State COVID-19 Vaccine Priority Populations. KFF https://www.kff.org/other/state-indicator/state-covid-19-vaccine-priority-populations/ (2021).
Larson, H. J. Defining and measuring vaccine hesitancy. Nat. Hum. Behav. 6, 1609–1610 (2022).
Ivory, D., Leatherby, L. & Gebeloff, R. Least Vaccinated U.S. Counties Have Something in Common: Trump Voters. The New York Times (2021).
Malik, A. A., McFadden, S. M., Elharake, J. & Omer, S. B. Determinants of COVID-19 vaccine acceptance in the US. E Clin. Med. 26, 100495 (2020).
Robertson, E. et al. Predictors of COVID-19 vaccine hesitancy in the UK household longitudinal study. Brain. Behav. Immun. 94, 41–50 (2021).
Caserotti, M. et al. Associations of COVID-19 risk perception with vaccine hesitancy over time for Italian residents. Soc. Sci. Med. 1982 272, 113688 (2021).
Pogue, K. et al. Influences on attitudes regarding potential COVID-19 vaccination in the United States. Vaccines 8, 582 (2020).
Reiter, P. L., Pennell, M. L. & Katz, M. L. Acceptability of a COVID-19 vaccine among adults in the United States: How many people would get vaccinated?. Vaccine 38, 6500–6507 (2020).
Raja, A. S., Niforatos, J. D., Anaya, N., Graterol, J. & Rodriguez, R. M. Vaccine hesitancy and reasons for refusing the COVID-19 vaccination among the U.S. public: A cross-sectional survey. 2021.02.28.21252610. https://doi.org/10.1101/2021.02.28.21252610 (2021).
Kricorian, K., Civen, R. & Equils, O. COVID-19 vaccine hesitancy: Misinformation and perceptions of vaccine safety. Hum. Vaccines Immunother. 0, 1–8 (2021).
Brewer, N. T., Chapman, G. B., Rothman, A. J., Leask, J. & Kempe, A. Increasing vaccination: Putting psychological science into action. Psychol. Sci. Public Interest J. Am. Psychol. Soc. 18, 149–207 (2017).
MacDonald, N. E. Vaccine hesitancy: Definition, scope and determinants. Vaccine 33, 4161–4164 (2015).
Soares, P. et al. Factors associated with COVID-19 vaccine hesitancy. Vaccines 9, 300 (2021).
Tibbels, N. J. et al. “On the last day of the last month, I will go”: A qualitative exploration of COVID-19 vaccine confidence among Ivoirian adults. Vaccine 40, 2028–2035 (2022).
Larson, H. J., Lin, L. & Goble, R. Vaccines and the social amplification of risk. Risk Anal. 42, 1409–1422 (2022).
Boucher, J.-C. et al. Analyzing social media to explore the attitudes and behaviors following the announcement of successful COVID-19 vaccine trials: Infodemiology study. JMIR Infodemiol. 1, e28800 (2021).
Roberts, C. H. et al. Vaccine confidence and hesitancy at the start of COVID-19 vaccine deployment in the UK: An embedded mixed-methods study. Front. Public Health 9, (2021).
Mahoney, M. et al. Gearing up for a vaccine requirement: A mixed methods study of COVID-19 vaccine confidence among workers at an academic medical center. J. Healthc. Manag. 67, 206 (2022).
Perez, A. et al. Factors related to COVID-19 vaccine intention in Latino communities. PLoS ONE 17, e0272627 (2022).
Dong, L. et al. A qualitative study of COVID-19 vaccine intentions and mistrust in Black Americans: Recommendations for vaccine dissemination and uptake. PLOS One 17, (2022).
Nichols, A. Causal inference with observational data. Stata J. 7, 507–541 (2007).
Wooldridge, J. M. Econometric Analysis of Cross Section and Panel Data. (MIT Press, 2010).
Trochim, W. M. K. & Donnelly, J. P. The Research Methods Knowledge Base. (Atomic Dog, 2006).
McNamee, R. Confounding and confounders. Occup. Environ. Med. 60, 227–234 (2003).
Greenland, S. & Morgenstern, H. Confounding in health research. Annu. Rev. Public Health 22, 189–212 (2001).
Greenland, S., Pearl, J. & Robins, J. M. Causal diagrams for epidemiologic research. Epidemiology 10, 37–48 (1999).
Latkin, C. et al. COVID-19 vaccine intentions in the United States, a social-ecological framework. Vaccine 39, 2288–2294 (2021).
Constantinou, A. C., Fenton, N., Marsh, W. & Radlinski, L. From complex questionnaire and interviewing data to intelligent Bayesian Network models for medical decision support. Artif. Intell. Med. 67, 75–93 (2016).
Holmes, D. E. & Jain, L. C. Introduction to Bayesian Networks. in Innovations in Bayesian Networks: Theory and Applications (eds. Holmes, D. E. & Jain, L. C.) 1–5 (Springer, 2008). https://doi.org/10.1007/978-3-540-85066-3_1.
Pearl, J. Causality: Models, Reasoning and Inference. (Cambridge University Press, 2009).
Brito, C. & Pearl, J. Generalized Instrumental Variables. arXiv:13010560 Cs (2012).
Butcher, B. et al. Causal datasheet for datasets: An evaluation guide for real-world data analysis and data collection design using Bayesian networks. Front. Artif. Intell. 4, 18 (2021).
Pearl, J. From Bayesian Networks to Causal Networks. in Mathematical Models for Handling Partial Knowledge in Artificial Intelligence (eds. Coletti, G., Dubois, D. & Scozzafava, R.) 157–182 (Springer US, 1995).
Khubchandani, J. et al. COVID-19 vaccination hesitancy in the United States: A rapid national assessment. J. Community Health 46, 270–277 (2021).
Viswanath, K. et al. Individual and social determinants of COVID-19 vaccine uptake. BMC Public Health 21, 818 (2021).
Fridman, A., Gershon, R. & Gneezy, A. COVID-19 and vaccine hesitancy: A longitudinal study. PLoS ONE 16, e0250123 (2021).
Bruine de Bruin, W., Saw, H.-W. & Goldman, D. P. Political polarization in US residents’ COVID-19 risk perceptions, policy preferences, and protective behaviors. J. Risk Uncertain. 61, 177–194 (2020).
Kerr, J., Panagopoulos, C. & van der Linden, S. Political polarization on COVID-19 pandemic response in the United States. Personal. Individ. Differ. 179, 110892 (2021).
Blauer, B. Evidence-based messaging to increase COVID-19 vaccination uptake | Bloomberg Cities. http://bloombergcities.jhu.edu/news/evidence-based-messaging-increase-covid-19-vaccination-uptake (2021).
Singapore Government. I got my shot to protect my loved ones at the community. gov.sg: Official online communication platform of the Singapore Government http://www.gov.sg/article/i-got-my-shot-to-protect-my-loved-ones-at-the-community (2021).
Liao, Q. et al. Priming with social benefit information of vaccination to increase acceptance of COVID-19 vaccines. Vaccine 40, 1074–1081 (2022).
Yu, Y. et al. Prosociality and social responsibility were associated with intention of COVID-19 vaccination among university students in China. Int. J. Health Policy Manag. 11, 1562–1569 (2022).
Wu, J., Chen, C. H., Wang, H. & Zhang, J. Higher collective responsibility, higher COVID-19 vaccine uptake, and interaction with vaccine attitude: Results from propensity score matching. Vaccines 10, 1295 (2022).
SteelFisher, G. K., Blendon, R. J. & Caporello, H. An uncertain public—encouraging acceptance of covid-19 vaccines. N. Engl. J. Med. 384, 1483–1487 (2021).
Kalam, M. A. et al. Exploring the behavioral determinants of COVID-19 vaccine acceptance among an urban population in Bangladesh: Implications for behavior change interventions. PLoS ONE 16, e0256496 (2021).
Brewer, N. T. What works to increase vaccination uptake. Acad. Pediatr. 21, S9–S16 (2021).
Zeelenberg, M. & Pieters, R. Consequences of regret aversion in real life: The case of the Dutch postcode lottery. Organ. Behav. Hum. Decis. Process. 93, 155–168 (2004).
Brewer, N. T., DeFrank, J. T. & Gilkey, M. B. Anticipated regret and health behavior: A meta-analysis. Health Psychol. Off. J. Div. Health Psychol. Am. Psychol. Assoc. 35, 1264–1275 (2016).
Reynolds-Tylus, T. Psychological reactance and persuasive health communication: A review of the literature. Front. Commun. 4, (2019).
CDC. CDC Vaccine Safety Monitoring. https://www.cdc.gov/vaccinesafety/ensuringsafety/monitoring/index.html (2020).
Schmitzberger, F. F. et al. Identifying Strategies to Boost COVID-19 Vaccine Acceptance in the United States. https://www.rand.org/pubs/research_reports/RRA1446-1.html (2021).
Finney Rutten, L. J. et al. Evidence-based strategies for clinical organizations to address COVID-19 vaccine hesitancy. Mayo Clin. Proc. 96, 699–707 (2021).
Agranov, M., Elliott, M. & Ortoleva, P. The importance of Social Norms against strategic effects: The case of Covid-19 vaccine uptake. Econ. Lett. 206, 109979 (2021).
Bernritter, S. F., van Ooijen, I. & Müller, B. C. N. Self-persuasion as marketing technique: The role of consumers’ involvement. Eur. J. Mark. 51, 1075–1090 (2017).
Steindl, C., Jonas, E., Sittenthaler, S., Traut-Mattausch, E. & Greenberg, J. Understanding psychological reactance. Z. Psychol. 223, 205–214 (2015).
Willis, D. E. et al. Hesitant but vaccinated: assessing COVID-19 vaccine hesitancy among the recently vaccinated. J. Behav. Med. 1–10. https://doi.org/10.1007/s10865-021-00270-6 (2022).
Purvis, R. S. et al. Trusted sources of COVID-19 vaccine information among hesitant adopters in the United States. Vaccines 9, 1418 (2021).
Moore, R. et al. The vaccine hesitancy continuum among hesitant adopters of the COVID-19 vaccine. Clin. Transl. Sci. 15, 2844–2857 (2022).
Ward, J. K. et al. The French health pass holds lessons for mandatory COVID-19 vaccination. Nat. Med. 28, 232–235 (2022).
Engl, E. & Sgaier, S. K. CUBES: A practical toolkit to measure enablers and barriers to behavior for effective intervention design. Gates Open Res. 3, (2020).
Mishra, A., Sutermaster, S., Smittenaar, P., Stewart, N. & Sgaier, S. K. COVID-19 Vaccine Coverage Index: Identifying barriers to COVID-19 vaccine uptake across U.S. counties. medRxiv 2021.06.17.21259116 (2021). https://doi.org/10.1101/2021.06.17.21259116.
Kolobova, I. et al. Vaccine uptake and barriers to vaccination among at-risk adult populations in the US. Hum. Vaccines Immunother. 18, 2055422 (2022).
Maragakis, L. & Kelen, G. D. Breakthrough infections: Coronavirus after vaccination. https://www.hopkinsmedicine.org/health/conditions-and-diseases/coronavirus/breakthrough-infections-coronavirus-after-vaccination (2021).
de Vries, J., Kraak, M. H. S., Skeffington, R. A., Wade, A. J. & Verdonschot, P. F. M. A Bayesian network to simulate macroinvertebrate responses to multiple stressors in lowland streams. Water Res. 194, 116952 (2021).
Kahn, C. E., Roberts, L. M., Shaffer, K. A. & Haddawy, P. Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comput. Biol. Med. 27, 19–29 (1997).
Glymour, C., Zhang, K. & Spirtes, P. Review of causal discovery methods based on graphical models. Front. Genet. 10, 524 (2019).
Squires, C. & Uhler, C. Causal structure learning: A combinatorial perspective. Found. Comput. Math. https://doi.org/10.1007/s10208-022-09581-9 (2022).
Kitson, N. K., Constantinou, A. C., Guo, Z., Liu, Y. & Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. https://doi.org/10.1007/s10462-022-10351-w (2023).
Nagarajan, R., Scutari, M. & Lebre, S. Bayesian networks in R with applications in systems biology (Springer, 2013).
Spirtes, P. L., Meek, C. & Richardson, T. S. Causal inference in the presence of latent variables and selection bias. https://doi.org/10.48550/arXiv.1302.4983 (2013).
Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell. 172, 1873–1896 (2008).
Chickering, D. M., Heckerman, D. & Meek, C. Large-sample learning of Bayesian networks is NP-hard. J. Mach. Learn. Res. 5, 1287–1330 (2004).
Heckerman, D. A Tutorial on learning with Bayesian networks. in Innovations in Bayesian Networks: Theory and Applications (eds. Holmes, D. E. & Jain, L. C.) 33–82 (Springer, 2008). https://doi.org/10.1007/978-3-540-85066-3_3.
Gendelman, R. et al. Bayesian network inference modeling identifies TRIB1 as a novel regulator of cell-cycle progression and survival in cancer cells. Cancer Res. 77, 1575–1585 (2017).
Acknowledgements
We thank Sofia Braunstein, Eli Grant, Neela Saldanha and Aysha Keisler for their contributions to the development of the survey. We also thank Jessica Barker, Grace Charles, Peter Smittenaar and Aaron Dibner-Dunlap for their insightful comments on earlier drafts of this manuscript.
Funding
This study was funded by the Surgo Foundation.
Author information
Authors and Affiliations
Contributions
S.K.S. conceptualised the study. H.F. and V.S.H. designed the study and analysed the data. H.F. and V.S.H wrote the final manuscript with input from all authors. All authors contributed to the interpretation of the results.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fung, H., Sgaier, S.K. & Huang, V.S. Discovery of interconnected causal drivers of COVID-19 vaccination intentions in the US using a causal Bayesian network. Sci Rep 13, 6988 (2023). https://doi.org/10.1038/s41598-023-33745-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-023-33745-4





