Abstract
What causes viewers to change their winner perception during a televised debate? The article addresses this question, drawing on a large-N field study of the 2021 chancellor debate in Germany, which contains survey and real-time response data for 4613 participants. Using machine learning techniques, we identify determinants of why participants change their opinion about who is winning the discussion during the debate. Our analysis based on random forest and decision tree models shows in detail, first, what factors drive debate winner perceptions in the course of televised debate reception. Second, we reveal what combinations of political predispositions and candidate statements are necessary to change the viewers’ debate winner perception. In doing so, third, we expand the toolbox of empirical debate research with our analysis based on machine learning algorithms. Our findings indicate that pre-debate chancellor preference and candidate images play a crucial role in determining post-debate perception change, while party identification is less important in predicting changes. We can also directly identify several speech moments in the debate that shifted viewer perception, a novel approach to evaluating political debate performance.
Similar content being viewed by others
Introduction
The 2021 federal election in Germany was special in three respects: first, for the first time in the history of the Federal Republic, no incumbent ran for re-election. Angela Merkel had announced her retirement after having served as chancellor for 16 years. Second, three parties (CDU/CSU, SPD, and Greens) were genuinely competing to win the election, with each party taking the lead in the polls within a few months. Consequently, third, three candidates were competing for the chancellorship with realistic chances: Annalena Baerbock (Greens), Armin Laschet (CDU), and Olaf Scholz (SPD).
In this political configuration, political communication during the election campaign can be said to play an important role in individuals’ voting decisions, and thus also in the outcome of the election (Maier and Faas, 2011). Alongside digital campaigning on the Internet and social media, televised debates were given a unique role in political communication. This is evident from the fact that, due to the pandemic and novel political conditions, not one or two duels as in previous years, but a total of three Triells were broadcasted by different stations immediately before the Bundestag election. With 4–11 million viewers, these TV events generated a wide outreach.
With 11 million viewers the second TV debate was a decisive moment in the campaign for all candidates. Public opinion could still considerably shift in the remaining two weeks until the election. The event was transmitted on public television and online via public streaming allowing for maximum of media reach and accessibility.
But who won the debate—and why? These are probably the most discussed questions in news coverage and among viewers immediately after a televised debate. Representative polls, talk shows with experts and party representatives, and interviews with viewers try to fathom both issues, as does empirical debate research, since debate winner perception has been found in previous research to be highly influential on subsequent factors of electoral decisions (Maier et al., 2022; Schrott and Lanoue, 2013; Maier and Faas, 2011; Aalberg and Jenssen, 2007; Reinemann and Maurer, 2005). Studies are indicating that this is especially true for countries without a strong partisan identification—in contrast to the American context (Lloyd et al., 2020; Aalberg and Jenssen, 2007).
Recent research in the German context has shown that the perception that a candidate has won a debate is essentially shaped by three factors: party identification, expectations about the debate winner, and the perceived debate performance. The latter has the strongest effect, while the first factor is mainly mediated by perceived debate performance. Pre-debate expectations about who will win the discussion only play a minor role. Although perceived debate performance is thought to play a key role in perceptions of the debate winner, studies assessing the impact of individual candidate statements are scarce, even though there is evidence that viewers rely on the candidate’s rhetorical performance, arguments, and policy statements rather than on non-verbal communication (Blumenberg et al., 2017; Nagel et al., 2012; Maurer, 2016; Reinemann and Maurer, 2005; Mazara, 2013).
Our article adds to this research strand with a novel perspective: we exploit our large N and diverse data, which provides real-time response measures (RTR) about the perceptions of the candidates’ statements on different policy issues and survey data collected immediately before and after the second TV-Triell debate from more than 4600 participants to assess not just the post-debate perceptions on the debate winner—which is common due to RTR-studies with only several dozen participants dominating debate research and thus providing only little variance— but changes in the viewers’ verdicts of a candidate being the winner before and after the televised debate. Additionally, we do not rely on an average evaluation of the candidates during the whole debate, but we zoom in on distinct statements of the candidates about individual policy issues during the discussion and assess their impact on shifts in winner perceptions while considering political predispositions. We thus link to the controversy in the literature on whether the reception of debates is determined by selective perceptual processes (Festinger, 1957) or whether candidates are in fact able to substantially influence viewers’ perceptions and judgments by what they say (Redlawsk et al., 2010).
Using random forest and decision tree models on our survey and real-time-response data from more than 4600 viewers of the second TV-Triell of the 2021 federal election in Germany, our analysis shows in detail, first, what factors drive change in debate winner perceptions in the course of televised debate reception. Second, we reveal what combinations of political predispositions and candidate statements are necessary to change the viewers’ debate winner perception. Third, we expand the toolbox of empirical debate research with our analysis based on machine learning techniques allowing for the first time to directly evaluate the impact of speech moments and viewer predispositions simultaneously with the help of machine learning in our models.
The results indicate that pre-debate winner expectation, evaluations of credibility, sympathy, and competence all play an important role in determining a change in the post-debate evaluation. But the study also reveals that these factors are connected to the perception of crucial speech moments of candidates and their performance in the debate.
The article proceeds as follows: In the next section, we review the literature, focusing on contributions that relate to verdicts on the debate winner in the German context. Section “Methods and measures” then turns to the methods and measures used, followed by a presentation of our findings in section four. A discussion about implications and limitations of our findings concludes the paper.
Literature review
The effect of pre-existing preferences
In research about televised debates, pre-existing candidate preferences—in reference to mechanisms of cognitive dissonance theory—are considered one of the most important cognitive filters that presumably influence the way viewers perceive and assess candidates in televised debates (Schrott and Lanoue, 2013; Lang and Lang, 1962; Katz and Feldman, 1962; Holbrook, 1996; Sears and Chaffee, 1979; Lanoue and Schrott, 1991). In short, viewers possess a strong tendency to regard the candidate they preferred before the debate as the winner of the debate even in the aftermath. Moreover, debate research has shown that not just candidate preferences but the perceptions of the candidates’ images, i.e. their qualities such as credibility and competence influence viewers’ post-debate verdicts (Otto et al., 2015).
In accordance with the theory of cognitive dissonance (Festinger, 1957), one may argue that information is often only actively processed if it corresponds to pre-existing political orientations, knowledge, attitudes, and behaviors. So-called selective exposure is particularly effective with regard to the selection of media content. Thus, people have strong incentives to predominantly consume media content that corresponds to their existing beliefs because it is found to be pleasant. Meanwhile, TV debates between leading candidates of different parties are considered to be a format in which political information is presented with little distortion, and different perspectives on political problems are provided (Warner and McKinney, 2013). Viewers consequently cannot choose the information they receive, which contradicts the mechanism of selective exposure. But of course, viewers can watch debates with strong predispositions and accordingly evaluate candidate performance based on their previously established biases.
Here also a second mechanism comes into play: selective perception describes a process in which dissonant information is edited in such a way that it does not harm one’s own belief system (Maier and Faas, 2003). For example, the quality of counterarguments can be inflated, the legitimacy of the speaker can be doubted, or the importance of the argument can be downplayed (Westen, 2008). In short, debate reception is a rationalizing process rather than a process in which new information is objectively evaluated and existing attitudes and behaviors are correspondingly adjusted (Knobloch-Westerwick, 2008). In this respect, political dispositions function as a powerful filter in the perception of debates and, consequently, in the evaluation of candidates and their performance.
Yet the mechanism of selective perception is not unbreakable (Redlawsk et al., 2010). Studies show that there is a tipping point beyond which even strong partisans begin to adjust existing attitudes and behaviors in line with new, dissonant information (König et al., 2021). This applies especially when party loyalties are weak or even non-existent. These persons are considered to be particularly receptive to campaign messages and information (Waldvogel et al., 2023). However, the literature provides a controversial discussion, to what extent certain statements and rhetorical strategies of political candidates systematically influence the responses and judgments of the viewers after the debate.
In order to assess whether a given piece of information has a consonant or dissonant character, heuristics are used. Party identification as a stable affective tie to a party (Campbell et al., 1960) is considered to be a particularly important heuristic for dealing with political information. According to the theory of cognitive dissonance, we can assume that a match in the party identification of the candidate and the recipient facilitates information processing.
Alongside the aforementioned factors, pre-debate expectation about the outcome of the debate is considered to be another important driver. Here, too, the argument of selective information processing applies: if a candidate is expected to outperform his opponents, he or she should systematically receive more favorable ratings during and after the debate with regard to his or her debate performance. There is empirical evidence for this relationship in the German context, although these associations have been found to be fairly weak (Bachl, 2013; Maier and Strömbäck, 2010; Schrott and Lanoue, 2008). While the significance of the relationship could indicate that pre-debate attitudes are quite powerful in filtering the perception of the debate, the fact that this relationship is rather weak indicates that they are far from being determinative.
That being said, the first question is to what extent political predispositions (party identification, candidate preference, candidate images, winner expectations) filter the perception of candidate statements or whether candidates are able to connect with viewers and even win them over regarding their verdicts on the winner of the debate. A second important question is whether the statements that are crucial for the (changes in) verdicts share common characteristics.
The effect of the political debate
Studies show that televised debates impact the image of candidates since the electorate has the possibility to judge the front-running candidate based on political qualities during the debate (Benoit et al., 2003; Pattie and Johnston, 2011; Maier et al., 2014; McKinney and Warner, 2013). However, recent studies show that candidate images are not only subject to debate effects, but that valence perceptions of the candidates influence post-debate judgments—at least in the short run (e.g. Lindemann and Stoetzer, 2021). Since all three candidates in the 2021 German federal election were not incumbents, it can be assumed that perceptions about the candidates’ images are an important factor in explaining the verdict of being the debate winner.
According to the theory of political campaign discourse (Benoit 2007, chapter 2), candidates can use three rhetorical strategies: they can rely on attacks, acclaim, or defense. The question of whether certain debate strategies are systematically associated with certain audience reactions remains controversial in current literature. Although acclaims are quantitatively the most widely used strategy in TV debates in Germany (Maier and Faas, 2003; Reinemann and Maurer, 2005; Maier et al., 2014; Maier and Faas, 2015), the systematic nature of their immediate perception has hardly been studied; the same applies to defensive strategies (Bachl, 2013; De Nooy and Maier, 2015; Jansen and Glogger, 2017; Maier and Faas, 2015).
By contrast, attacks have received significantly more attention in empirical debate research. However, the overall findings are ambiguous: while some studies fail to observe any effects of attacks on viewers’ immediate perceptions (De Nooy and Maier, 2015), others consistently record negative reactions (Nagel, 2012, p. 195), seemingly contradicting those studies that assume positive effects for the attacker (Bachl, 2013; Jansen and Glogger, 2017; Maier and Faas, 2015; Nagel et al. 2012), especially when the attacked person defends himself against the accusation (De Nooy and Maier, 2015; Spieker, 2011). In this context, it has also been shown that the perception of an attack depends on the specific content of the negative message and that characteristics of the recipient (e.g., party identification and personality traits) moderate the reaction to an attack (Bachl, 2013; De Nooy and Maier, 2015; Jansen and Glogger, 2017; Maier and Faas, 2015; Nagel et al., 2012, Boussalis et al., 2021). It remains unclear whether candidate statements decisive for their perception as debate winner share common characteristics in terms of rhetorical strategies; this article will attempt to partly bridge this gap.
Fulfilled or disappointed expectations of viewers are another factor in the evaluation of post-debate winner perception relevant for this research. Previous studies show that less committed or indecisive voters react differently to fulfilled and disappointed expectations than voters with strong party identification (Maier et al., 2022, p. 254, Reinemann and Maurer, 2005, p. 789).
In order to assess the impact of candidate statements on changes in the individuals’ perception of the debate winner, it is necessary to quantify the audience’s evaluation of the candidates. For this purpose, our study uses real-time response measures. Participants had the opportunity to evaluate the statements of the candidates in the TV debate in real-time with an application on their smartphone in natural a situation. The idea of this approach is that real-time evaluations result in an overall judgment, so candidates should aim to develop policy statements and rhetorical strategies that evoke positive responses among viewers in order to be retrospectively considered the winner. Evidence from recent studies indicates that what matters for the perception of debate performance is not just what candidates say (Blumenberg et al., 2017; Otto et al., 2015; Jansen and Glogger, 2017), but how they say it (De Nooy and Maier, 2015; Schill and Kirk, 2014).
Summarizing the current state of research
Our measurement of pre-debate winner expectation is a collection of different aspects. It contains the expectation of viewers that their preferred candidate or party will succeed. It also can contain a certain degree of expectation about the rhetorical performance of all participants. After all, viewers can expect politically distant candidates to win.
Previous research indicates that it is mainly the evaluation of the candidates during the debate that determines the post-debate winner perception, whereas political predispositions seem to play a subordinate role (Yawn et al., 1998, Reinemann and Maurer, 2005). Whether this finding also applies to the evaluation of changes in perceptions about the debate winner is unclear due to a lack of studies on this dynamic phenomenon. A research gap this article tries to address.
We argue that predicting change in perception during the debate—as we do in our setup—is a harder but maybe more rewarding task compared to focusing only on predicting the post-debate evaluation of participants. Change can be triggered by disappointment with a previously favored candidate, new information about political goals and agendas, or even the rhetorical skills of candidates. We show that with the help of machine learning techniques and tree models, scholars can identify individual pathways for different viewer groups and differentiate what message works for what sub-group to trigger a change in their debate winner evaluation.
There is an active discussion in debate literature on the impact of an individual’s political predisposition on their perception of the debate winner and whether the candidates can reach the audience with their statements regarding single policy issues. The potential ability of candidates to persuade viewers to view them as the winner of the debate might have a significant impact on the viewers’ subsequent decisions regarding the election (Aalberg and Jenssen, 2007), such as their candidate preference or voting intention. It is also unclear whether candidate statements relevant for (changes in) the perception of the debate winner share specific rhetorical characteristics. We address both issues in our analysis and outline our methods and measures in the following sections before presenting our results in section four.
Hypotheses
Based on theory and the beforementioned literature review we test the following two main hypotheses:
Hypothesis 1: The pre-debate dispositions of viewers determine the probability of change in debate winner perception.
The measured ex-ante variables evaluating viewer predisposition are pre-debate winner preference, pre-debate chancellor preference, party affiliation, credibility, sympathy, competence, and leadership evaluations of all three candidates as well as the age of viewers.
Hypothesis 2: Agreement and disagreement to key speech moments determine change in post-debate winner evaluation. agreement and disagreement are measured with real-time response data.
Methods and measures
Sample
Over 11,000 users logged into the web application in total. Almost 9000 people passed through the tutorial and filled out the pre-survey. Just over 8000 people provided at least one real-time rating. From these participants, about 4700 people participated in continuous activity (see Waldvogel and Metz, 2020) and also completed the post-survey. Excluded from this sample were those individuals who appeared to be using a script for automated entry of real-time ratings (more than 60 clicks per second), those who were manually flagged as spammers (more than 30 clicks in any minute of the duel), and individuals who drew attention based on their IP address. Thus, 4613 participants remain in the data set and, with their real-time and survey responses, form the basis for the subsequent analysis of the immediate perception of the debate and the debate winner.
Viewing the socio-demographic characteristics of the sample, it becomes evident that it is relatively young (56% under 40 years of age), somewhat male-dominated (male: 55%, female: 43%, 1% diverse) and characterized by a high level of formal education (58% with tertiary education) and strong interest in politics (84% with strong or very strong general interest in politics). In addition, individuals with a voting intentionFootnote 1 for the Greens (39%) are overrepresented compared to those with a preference for the Social Democrats (16%) or the CDU/CSU (17%) compared to representative election polls at the time of the Triell.
Participants from all over the country were recruited through cooperation with about 20 newspapers and from the “PolitikPanel Deutschland” hosted by the University of Freiburg, Germany. We have to acknowledge that this convenience sample pulled does not correspond to the general population in Germany. Thus, representative inferences are not possible. We are also aware that the audience of televised debates differs considerably in its sociodemographic characteristics from the general population in Germany (see Waldvogel 2020). Unfortunately, we do not yet have information from the German election studies on the composition of viewers to the Triell. However, it is not the goal of this analysis to derive general inferences on the overall population or on all viewers. The focus lies on the actual study participants and the testing of our theoretical expectations. For this purpose, our sample provides many cases and high heterogeneity so that emerging subgroups include a sufficient number of participants. Even though the basis of a convenience sample cannot guarantee that the estimates generate valid results, previous debate research points to the fact that valid estimates can be obtained even if the sample differs substantially from the total population (Boydstun et al., 2014a, König et al., 2021). As a robustness check for the effect of the convenience sample, we also generated a machine-learning model using only TV viewers excluding online streamers. Results of these models are presented in the online annex. This sample is on average older, more conservative, and closer to the average German voter. Many findings remain stable between both iterations, and we discuss differences in the additional materials. We are therefore confident that our real-time measures and survey data provide a good basis for estimating the relationships of interest.
Stimulus
The second Triell between Annalena Baerbock (Greens), Armin Laschet (CDU/CSU), and Olaf Scholz (SPD) took place in the evening of September 12, 2021. In the live broadcast, which lasted for more than 90 min, the chancellor candidates discussed important issues in the election campaign, including COVID-19 and climate policy, as well as topics such as social justice, finance, taxes, and the economy. Around 11 million viewers watched the program on the public broadcasters ARD and ZDF.
Device
During the Triell, the viewers had the opportunity to rate the debater live from home and in real-time via the Internet with their own (mobile) device using a web application. For this purpose, the so-called Debat-O-Meter provides a linear module structure derived from classical laboratory experimental settings of empirical debate research in order to ensure the standardization of the data collection process and the internal validity of the measurement procedure: after registering, a tutorial appears which includes instructions on the use of the app and its form of measurement. This is followed by a preliminary survey on the socio-demographic variables of the users, as well as their political attitudes and expectations regarding the debate. The core of the application is the RTR module (Fig. 1). This allows users to rate the debaters on a scale from “++” for a very good impression to “–” for a very bad impression in every second of the debate. Inactivity is counted as a neutral impression according to the measurement manual. The data from this “virtual lab” is immediately stored on a server together with a time stamp and pseudonym. For statistical analysis, this input is recoded into a range of values from −2 to +2. After the debate, the participants are forwarded to the post-debate-survey. Finally, they receive an overview of their own evaluation behavior for each candidate during the entire debate and on the various topics of the evening. A complex security architecture and user monitoring complete the functional scope of the Debat-O-Meter.
Methods
Decision tree models
Decision tree models are used in machine learning settings for the automatic classification of objects according to binary decision rules (Breiman et al., 1984). It is a computer-based supervised learning method that uses the available data to find binary stochastic rules that can be applied to maximize the frequency of a certain target category or class in a sub-sample. The algorithm provides chains of simple yes- and no-conditions that lead to finding a high frequency of the investigated outcome. We find for example that the presence of some RTR-response variables above and below certain thresholds in the response pattern of viewers massively increases the chance of a shift towards a certain candidate. A detailed description of the decision tree algorithm is provided in the appendix (Section 8).
Furthermore, to configure the splitting mechanism we define a complexity parameter (CP) whereby any split that does not decrease the overall lack of fit by a factor of CP is not attempted (Therneau and Atkinson, 2019). CP regulates how incremental individual splits can be and thus how complex tree models can grow. If the CP is set to higher values tree models become less complex and only the most important combination pathways to the target variable are tried. In our analysis, we experimented with different CP-values and arrived at a good balance between complexity and sensitivity at CP values of 0.01–0.015.
Random forest
Random forest is a tree-based ensemble learning method originally developed by Tin Kam Ho (1995) and refined and further developed by Leo Breiman (2001) and Cutler et al. (2012). It is currently one of the most popular machine learning techniques applied to prediction tasks across a multitude of scientific disciplines and businesses. It generates multiple of the beforementioned decision tree models from randomized data to learn configurational aspects of a dataset in a non-linear way. The process used in generating many trees is called ‘bootstrap aggregation’ or short ‘bagging’ and was originally developed by Breiman to avoid overfitting and to reduce variance in the statistical learning method (Breiman, 1996; James et al., 2013, p. 316). ‘Bagging’ describes the process of generating randomized sub-samples of the original data distribution drawn with replacement to create independent predictions (Breiman, 1996, p. 123).
This results in a “forest” of randomized decision trees, all trained on different aspects of the overall data. In order to produce a consolidated result with this forest, all individual tree models vote in a process of majority voting to determine the favored solution for the presented prediction problem in a given classification task is. Once this forest is created it can be stored as a learning model to predict cases where the category to predict is yet unknown for the case. Technical details are provided in the appendix (Section 8).
In our application, we create this prediction model to evaluate its properties, mainly the prediction quality and variable importance. We can thus learn to understand which aspects of the debate’s real-time response and survey data helped the machine learning system to correctly anticipate change in an individual’s perception of the debate winner.
Random forest are considered a very robust technique regarding class imbalance, tuning parameters, and non-linear relationships in data, thus being perfectly suited for the type of quasi-experimental data we use in our research (James et al., 2013, pp. 318–319).
Measures
Four sets of variables were created from the RTR- and panel survey data: (1) one endogenous variable of changes in debate winner perception; (2) real-time response measures; (3) exogenous variables on party identity, candidate preferences in terms of who is preferred as the future chancellor, candidate images on credibility, likability, leadership and competence, and pre-debate expectations about who will win the debate from the pre-survey; (4) and a set of control variables.
Endogenous variable
To evaluate changes in winner perceptions during the debate we asked all participants for their pre-debate winner expectation and compared the responses given with the post-debate winner perception (Randomized options: Scholz, Baerbock, Laschet, Tie). The total number of participants whose perception changed during the debate is 2053 (44.50% of participants in the sample). Figure 2 provides an alluvial diagram visualizing the total change for each option from the pre- to post-survey question, showing the total movement induced by the debate.
The diagram shows the change of winner expectation to winner post evaluation induced by the debate performance. Annalena Baerbock and Armin Laschet convinced more viewers during the debate and increased their proportional share. Olaf Scholz lost favor with viewers during the debate. The share of viewers expecting a “draw” also declined significantly. Most viewers reported in the post-evaluation a clear winner based on debate perception.
About 55 percent of the participants (N = 2560) did not change their initial expectation about who would win the debate (including no change from draw to draw). The Green candidate Baerbock was able to convince 24 percent of the sample (N = 1091) that she had won the debate, even though they had chosen a different option beforehand. By contrast, Laschet was able to win over 10 percent of the study participants (N = 425). For the future chancellor Scholz, this was true in only 5 percent of cases (N = 250). Changes from no response to any candidateFootnote 2 (2%; N = 96) and shifts from any candidate to draw (4%; N = 191) were relatively rare.
Real-time response measures
Participants provided feedback in real-time about their evaluation of the candidate statements on a five-point scale from −2 (very poor) to +2 (very good) for all three candidates separately. No input was considered a neutral impression and thus corresponds to the value 0 (in accordance with the instructions given to the participants). We deviate somewhat from previous RTR-based research, since we consider not just the average evaluation of the candidates over the whole debate or on recurring policy issues, as is common, but we also use our large and heterogeneous sample to consider the RTR-evaluation on every single statement made by the candidates. Therefore, we coded the content of the debate adapted from an established codebook by the German Longitudinal Election Study (2019). This coding left us with 293 speech phases in total (Baerbock = 91, Laschet = 100, Scholz = 102; moderators excluded). For every time interval of any single statement, we summed up positive and negative ratings the participants had cast for the respective candidate. In order to account for delays in responses to the stimulus, we follow Nagel (2012, p. 155) and consider RTR-ratings up to four seconds after the speech phase of a candidate has ended.
Exogenous variables
Party identification was measured in the pre-survey, where participants indicated whether they were aligned to any political party and if so, which one. In our sample, participants identifying with the Green Party were dominating (40%; N = 1824) in relation to identifiers of SPD (16%; N = 738) and CDU/CSU (17%; N = 768).
Candidate preferences in the pre-survey were measured using three categories with which participants indicated which of the candidates they preferred as the future German chancellor. Baerbock (1. preference = 2005; 2. Preference = 1223; 3. Preference = 1154) and Scholz (1. preference = 1417; 2. Preference = 2551; 3. Preference = 433) are similar to each other in the preference patterns, while Laschet (1. preference = 1003; 2. Preference = 418; 3. Preference = 2879) ranks far behind in our sample.
Candidate images were measured in the pre-survey on a five-point scale ranging from −2 to +2 regarding their qualities of credibility, likability, leadership, and competence. Scholz is rated highest in terms of leadership with an average rating of 0.25 compared to Baerbock (−0.10) and Laschet (−0.70). The same order also applies to the competence in solving political problems (Scholz = 0.30; Baerbock = −0.02; Laschet = −0.55). The highest level of credibility is attributed to Baerbock (0.41), followed by Scholz (−0.08) and Laschet (−0.71). A similar pattern emerges for likability: Baerbock (0.59) ranks highest in our sample, well ahead of Scholz (0.14) and Laschet (−0.64).
Finally, the participants were asked prior to the debate which of the candidates they expected to win or whether they anticipated a draw. Most expected Scholz to win (36%, N = 1675), well ahead of Baerbock (26%, N = 1206) and Laschet (15%, N = 689). However, 21% of respondents (N = 945) did not expect a clear winner.
Control variables
Analyses were controlled for age, gender, education, and political interest, which were measured in the pre-survey. Age was measured in seven categories ranging from under 18 to over 70 years. The gender variable covers male, female, and non-binary. Formal education was measured in six categories reflecting the German educational system. Respondents indicated their level of political interest on a five-point scale from very weak to very strong.
Results: what changed winner’s perceptions during the debate
Based on the aforementioned measures and methods we investigate whether a common pattern between the participants’ responses in the pre-survey and their real-time evaluations of statements made by the candidates can be made out, for cases in which their perception of the winner changed during the debate. Firstly, our goal is to determine the relationship between political predispositions and the real-time rating of candidate statements and, secondly, whether the candidate statements that are crucial for a candidate to be considered the winner of the debate share common characteristics. We apply two different Machine Learning techniques on the three dichotomous variables “change to Baerbock”, “change to Laschet” and “change to Scholz”. First, we apply a random forest algorithm (RF) to determine what variables were most influential in shifting winner perception overall for each candidate. In the second step, we apply decision tree models to visualize and comprehend the connection between these different variables in more detail. We predict each dependent variable separately in individual models, allowing us to find the variables responsible for changes in debate winner perception for each candidate individually. Applying these techniques, we can demonstrate that our models are capable of correctly learning pre-debate data and RTR response patterns in connection to each other and of predicting winner perception change in a significant majority of cases based on this information. An extensive section of robustness checks for all models can be found in the Annex of this article including a model of only TV-viewer data (Sections 1 and 2). Section 6 contains an RF model predicting the shift away from Scholz as the predicted winner. Section 8 of the annex presents various robustness checks and decision tree results for combined RTR and pre-debate data.
Our primary RF results for victory change to Annalena Baerbock (Model 1) and to Armin Laschet (Model 2) are presented in Tables 1–5 of the article.
Changes in debate winner perception towards Baerbock
For Model 1 Out-of-Bag AUC is 93.22. We can predict 87.4% of all cases (change and non-change) correctly. Error rate in predicting true positives is 26.49% as shown in Table 2. This means we can correctly predict almost 75% of participants that changed their winner perception to Annalena Baerbock (AB), based on pre-debate and RTR response patterns. Even more informative is the evaluation of variance importance measurements (VIMs), which is a measurement for which variables affect prediction quality the most. A ROCAUC diagram showing the ratio between specificity and sensitivity of the model is included in Fig. 3 and a list of the 15 most influential variables is presented in Table 3.
We can deduce that pre.victor, the pre-debate winner preference, is the most important factor for the ML algorithm to learn if change to Baerbock is probable. This is self-evident because the algorithm needs a reference point of what was preferred before the debate to correctly estimate change probabilities based on the response patterns of participants.Footnote 3
The second most important factor is the chancellor’s preference for Annalena Baerbock (pre.chancellor.ab), which indicates how strongly participants support the candidate as the future chancellor. Pre.ab.comp and pre.ab.cred follow on the list indicating the importance of the pre-debate credibility and competence estimation for Annalena Baerbock in predicting change. Two candidate statements appear as important variables predicting change to Baerbock: V204 (rank 6) and V261 (rank 7). Furthermore, V137, V288, V202, and V260 are also in the top 15 list as important moments of the debate that determine change to Baerbock. Additionally, the perceived competence of Scholz (rank 11) and Laschet (rank 8) are important indicators, as well as likability for Baerbock (rank 9). Age (rank 14) was also an indicator of positive change, indicating that Baerbock’s performance resonated more with viewers of age groups 20–29 and 30–39 and led to an increased shift in these demographics (Fig. 4).
Regarding the candidate statements (complete debate statements are provided in the Annex in German and English) we find that V204 was a statement of Baerbock where she attacks the Grand Coalition, of which her contenders Scholz (SPD) and Laschet (CDU) were part of, explaining that their response to the constitutional court’s order to revise the current climate protection laws of Germany was insufficient in the face of the current climate issues.
The second most important statement according to the RF algorithm was V261, in which CDU-frontrunner Laschet countered Baerbock by saying that Germany already provides a modern immigration law and that no improvements are needed.
In order to better understand the effect of crucial candidate statements for changes in debate winner perception towards Baerbock throughout the debate, we use the second machine learning technique to generate individual decision tree models. The results are presented in Fig. 5. The decision tree follows through pathways of causal conditions (answered by yes: left branch; or no: right branch) that might lead to a change in debate winner perception. The model is specified to seek pathways that end with a high probability of change to the respective candidate.
This decision tree shows decision pathways that lead to high frequencies of changing participants (blue nodes) and non-changing participants (green nodes). Every node presents an overall trend in the first line of the box (0: majority does not change/1: majority does change in this node), a percentage of the proportions of each category (e.g. node 1: 76% no change/24% change) and in the last line a percentage value about how many cases are included in this category from the overall sample of 4613 participants (node 1: 100%). Conditions are shown below the node separating the previous node into two smaller nodes consisting of cases separated by the condition. If the condition is fulfilled the pathway continues left, if the condition is not fulfilled the pathway continues right. To reach D (node 7) the condition pre.victor = Baerbock, Laschet and no.response as well as the condition V204 < 1.5 must NOT be fulfilled.
While A and B in Fig. 5 are rather complex and represent only 5% of total cases together, the pathways of C and D are easier to interpret and with 7% and 10%, respectively, represent a majority of cases with a change in debate winner perception. This rather large section of cases is interpreted in detail here.Footnote 4 Combination D, the most prevalent pathway to change, represents 10% of all valid cases and participants included here have a 70% chance to express a change towards Baerbock after the debate. To be associated with combination D viewers must have provided the following inputs: they have specified Scholz as pre.victor preference (tree node 1), they must also have provided an RTR response of higher than 1.5 points (agreement) in candidate statement 204 (tree node 3). This combination is visualized in Fig. 6 to show how these two variables are associated with winner perception change towards Baerbock. We clearly observe that participants with positive values of >1.5 in V204 change to Baerbock after debate, if they are not already expecting Baerbock as winner. We also note that participants with winner expectation for Laschet rarely express support for V204, explaining why the decision tree model did exclude this pre-debate preference from pathway D (node 1).
The effect of V204 in the tree model for all cases with Scholz and draw as pre.debate winner expectation. Participants with pre.victor Baerbock were excluded because these participants did not switch their expectation. Participants expecting Laschet to win rarely agreed with V204 and thus did not show up in the tree pathway.
In case participants have provided an input of 1.5 points or lower in V204 (not agreeing too strongly with the above-mentioned passage about the perceived insufficiency of climate change legislation of the Grand Coalition) another pathway to change towards Baerbock exists via three others candidate statements. This pathway C is characterized through the following combination: V157 < = 0.5 points (node 6), V207 > −0.5 points (node 13), and V137 > 0.5 points (node 27). If participants responded with this combination of RTR responses, they had a 59% chance of changing their winner perception to Baerbock.
Regarding these three important candidate statements, V157 is a sarcastic statement by Laschet towards the Green party, in which he (indirectly) accuses Baerbock of a climate policy of “bans” and “slogans”. For a shift to Baerbock, the RTR rating cannot be higher than slightly positive. It is in fact negative for most participants reporting a change. V207 is an attack by Baerbock in which she accuses Laschet of an opportunistic campaign regarding electromobility and gives a reply to his statement on “slogans” and “bans” (V157). She combines this accusation with a clear policy statement that the future of individual mobility is electric. This scene is embedded in a direct confrontation with Laschet. For a shift to Baerbock agreement to these phrases had to be positive or only slightly negative. The last statement relevant to this path was the agreement with V137, where Baerbock rejects Laschet’s demand for an independent digitalization ministry because she declares it a cross-sectional task for the chancellor.
To demonstrate the effect on winner perception we visualized the relationship between candidate statements V137 and V157, both active speaking phases of Baerbock in pathway C, in Fig. 7. Both represent strong elements in the decision tree chain to indicate a positive shift in pathway C. V157 has not been found in the top 15 of variance importance ranks in the RF models (it appears at position 26) of the VIMs of the Change to Baerbock model, but it is a crucial feature to predict change in the iteration of Change to Laschet (top 5), who was speaking in this moment of the debate, as well as in the robustness check model without online streamers provided in the annex.
Changes in debate winner perception towards Laschet
Using the same techniques, we find the following results for Armin Laschet (AL; CDU): RF results for change to Armin Laschet are presented in Table 4. Models Out-of-Bag AUC is 95.86. We can predict 95.9% of all cases correctly. Error rate in predicting true positives of change to AL is 24.94%. The corresponding ROCAUC is presented in Fig. 8. The evaluation of the top 15 variables (Table 5) in terms of Variance Importance (VIM) shows, that for true positives (a change to Laschet (1)) the pre-debate chancellor preference for Laschet is even more significant in predicting change than the pre.victor variable. This variable occupies the first place for viewers who changed their preferences. This could indicate that if somebody preferred Laschet as future chancellor but was initially skeptical about his debate-winning chances, the televised debate had a rather positive impact on the participant’s perception. Pre.victor is in second place, followed directly by the first candidate statement (V288).
Additionally, the following list presents candidate statements that are, in descending order, strongly relevant to a shift in debate winner perception towards Laschet: V157, V42, V191, V153, V273, and V262. For the pre-debate information pre.al.cred the credibility of Laschet, and the competence and likability of Baerbock (pre.ab.comp and pre.ab.symp) play an important role in explaining changes towards Laschet. Interestingly, the variable competence of Baerbock (rank 9) is of higher variable importance than the same variable for Laschet personally (rank 12).
Regarding the “defining moments” of the debate in terms of changes in debate winner perceptions towards Laschet, his statement V288 (rank 3) is a form of self-humbling and patriotic address to the electorate at the end of the debate, affirming that the choice is to be made only by the voters. This statement increased the probability of a change more than any other statement. Again, we applied decision trees to evaluate the pattern of responses found in participants switching to Laschet. The results are presented in Fig. 9. RF and decision tree results are rather similar; thus, we interpret all relevant candidate statements here in relation to their decision tree pathway.
This decision tree shows decision pathways that lead to high frequencies of changing participants (blue nodes) and non-changing participants (green nodes). Every node presents an overall trend in the first line of the box (0: majority does not change/1: majority does change in this node), a percentage of the proportions of each category, and in the last line a percentage value about how many cases are included in this category from the overall sample of 4613 participants. Conditions are shown below the node separating the previous node into two smaller nodes consisting of cases separated by the condition. If the condition is fulfilled the pathway continues left, if the condition is not fulfilled the pathway continues right.
Again, four main pathways for change are found to be important by the decision tree model. Pathway F represents the smallest number of cases that amount to <0.5% of the total. The combined effect of agreement with V153 and agreement with V157 as important components of this pathway is shown in Fig. 10. V157 was, as mentioned before, the moment Laschet accused the green candidate Baerbock of a climate policy consisting of bans and slogans. V153 presents a defensive speech of Laschet answering a provocative question by the moderator about how many days (of natural disaster) are needed to profoundly change German climate policy. He combines this defense with the clear statement that the German phase-out of nuclear energy before the withdrawal from coal was a mistake if the goal was to achieve climate neutrality.
Next, we focus our analysis on pathways E, G, and H (each pathway representing ~2% of total cases). While the three other pathways include a positive response to speech moment V157, pathway E includes cases that are not in high agreement with Laschet’s statement about bans. We assume viewers less focused on climate issues would fit this pattern. Additionally, the following set of responses must be present for pathway E: the response value in V288 must not be lower than 1.5, and the value in V42 must not be below −0.5. Also pre.victor estimation cannot be Laschet or no.response in pathway E. 86% of participants with this combination showed a shift to Laschet during the debate.
V288 was Laschet’s patriotic address at the end. V42 was a direct attack on Scholz for his handling of the Wirecard scandal and regarding a current investigation into the federal ministry of finance (which Scholz was heading at this time). The combination of V157 with a threshold of 0.5 and having a positive agreement with V42 as a line of attack in the pathway clearly indicates that pathway E was mostly important for viewers that switched from Scholz to Laschet. It demonstrates how the Wirecard Scandal and the ongoing investigation were used by Laschet to successfully score political attacks against Scholz.
Pathway H requires the following combination of responses during the duel: V157 ≥ 0.5 and V288 ≥ 1.5 while the pre.victor preference was either Scholz or draw. We conclude that this pathway together with pathway G also did not work very well on participants who preferred Baerbock before the debate started. The combination of agreement with V157 and V288 resonated better with pre-debate supporters of Scholz or viewers who expected a draw (more evidence for this is in the appendix, Section 8, Table A8).
Changes in debate winner perception towards Scholz
In this last section of the analysis, we examine the pattern in RTR- and pre-survey data for change in debate winner perception towards Scholz. At the beginning of the debate, most of the participants expected him to win the debate. But he only convinced 250 additional participants during the debate to perceive him as the winner, while simultaneously losing 935 to the other two contenders. This means that the imbalance between change and no positive change in his RF models was very high. Thus, the RF algorithm for positive change did not gain enough sensitivity to pick up positive change for the majority of these cases.Footnote 5 We decided to only focus on decision trees for the analysis of positive pathways that lead to change in Scholz. Decision trees are not dependent on class balance and thus provide clear and interpretable results in this setting without the limitations of RF. The decision tree generated for positive change towards Scholz is presented in Fig. 11.
This decision tree shows decision pathways that lead to high frequencies of changing participants (blue nodes) and non-changing participants (green nodes). Every node presents an overall trend in the first line of the box (0: majority does not change/1: majority does change in this node), a percentage of the proportions of each category, and in the last line a percentage value about how many cases are included in this category from the overall sample of 4613 participants. Conditions are shown below the node separating the previous node into two smaller nodes consisting of cases separated by the condition. If the condition is fulfilled, the pathway continues left, if the condition is not fulfilled, the pathway continues right.
We see that Scholz’s only major pathway to win participants over was via the candidate statements V17, V200, and V244. Only ~1% of participants presented a pattern in these three variables that corresponded to a clear switch to Scholz during the debate. However, participants with this response combination had a high probability of 83% to switch to Scholz. Agreement with V17 (>+1.5) was an important factor for change towards Scholz, but unrelated to the pathways and change patterns of the other candidates. In this statement, Scholz replies to a provocative question of the moderator by defending his position of not ruling out a coalition with the leftist party (“Die Linke”) in a future government. He combines this reply with a humorous statement and stresses the importance of the citizen’s right to decide the electoral results, which—as in the previously presented statement V288 of Laschet—resonates well with viewers.
V244 is a relevant component for him as well as in the decision tree of Baerbock, indicating that this moment played a pivotal role for many viewers to decide between these two candidates. A response over +0.5 (V244) corresponds to a pathway to Scholz, and a response under +0.5 (V244) follows the pathway to Baerbock. In this statement, the SPD frontrunner outlines a positive picture of the development and stability of the pension system in Germany, which he links to a pension guarantee for the general population. Participants agreeing with his pension-related considerations in this segment have a high probability of change to Scholz. A visualization of pathway I is given in the appendix.
Discussion, limitations, and implications
The present paper sought to shed light on the effect of political predispositions and individual statements made during a debate on changes in debate winner perception using large N survey and real-time response data from more than 4600 study participants who watched the second Triell between Baerbock (Greens), Scholz (Socialdemocrats), and Laschet (CDU). We add novel insights to the existing literature: to the best of our knowledge, this is the first time that RTR- and survey data collected during a televised debate is analyzed in combination with machine learning techniques. This ties in with recent studies that consider the peculiarity of RTR measurement, since it is evident that these data have a time series character, are consequently autocorrelated, i.e. the measured values are not independent, that they have a hierarchical structure and that basal linearity assumptions are hardly met (Alletsee, 2015; Bachl, 2017; Iyengar et al., 2016; Boydstun et al., 2014b). Secondly, our study provides a large-N and heterogeneous sample, which is why we are able to identify not only determinants of the post-debate verdict on the debate winner but also to investigate the dynamic process of change in this perception since our sample offers sufficient variance in the dependent variable. Thirdly, based on our methods and measurements we can draw a nuanced picture of the effect of political predispositions, on the one hand, and, on the other hand, the effect of individual candidate statements about individual political issues on the change in debate winner perceptions. Using random forest algorithms and decision tree models we find rich diversity between the different models of the three candidates in the pattern of how political predisposition impacts changes in the perception of the debate winner.
Summarizing the random forest models first, pre-debate expectations on the winner of the debate is a decisive factor in whether a change in winner perception can be properly predicted by our models in accordance with our hypothesis 1. If and how this might be related to our application of machine learning methods and their unique methodological attributes should be considered a question for future research. A second crucial variable is the pre-debate chancellor candidate preference. This is in line with previous findings that emphasize the role of candidate preferences for the perception and evaluation of candidates in televised debates (Schrott and Lanoue, 2013; König and Waldvogel, 2022; Waldvogel et al., 2023) and aligns with our hypothesis 1. With regard to the impact of candidate images, the picture becomes somewhat fuzzy. A candidate’s credibility seems to have a consistent influence on whether a participant would tend to revise his or her debate winner perception. Moreover, perceived competence is an important and consistent variable in the RF models. For both Laschet and Baerbock their own competence score, as well as the opposing candidate’s score impacts a participant’s decision on shifting their winner perception in favor of other candidates. With this in mind, both near-role and role-distant characteristics of the candidates appear to be important variables (Otto et al., 2015; Brettschneider, 2002). However, sociodemographic factors seem to exert only a subordinate influence on a participant’s perception, which is in line with current studies (König and Waldvogel, 2022; Waldvogel et al., 2023).
Regarding hypothesis 1, it is remarkable, that party identification hardly plays any role in predicting changes in debate winner perception, given its prominent position in the literature on campaign effects of televised debates (Mullinix, 2015; Warner et al., 2019). On the other hand, this irrelevance is already indicated in the findings on post-debate winner perception and can be explained: party identification is a long-term attachment to a party (Campbell et al., 1960), which is why it is primarily able to explain stability in political attitudes and why debate winner perception does not change. However, it can hardly explain the dynamic process of why a shift occurs (König et al., 2021). This is where the impact of candidate statements on individual policy issues comes into play.
For hypothesis 2 we find that in all RF models, candidate statements on single policy issues are of great importance for correctly predicting a change in debate winner perception. This becomes even more impressive when we look at the pathways of the decision tree models—starting from the pre-debate expectation on who will win the debate—in which the individual’s judgment on concrete candidate statements essentially determines the progression in perception chance. Thus, we are essentially consistent with the findings of previous research and extend the insight that candidate statement ratings are not only important for post-debate winner perceptions (Maier et al., 2016; Reinemann and Maurer, 2005), but also substantially influence the dynamic pattern of a change in perception. Therefore, we do not rely on average RTR ratings over the entire debate as opposed to the dominant literature, which gives the misimpression that every moment of the debate is of equal relevance (Alletsee, 2015). Rather, in considering the dynamics of the debate, we use the individual RTR ratings on 293 candidate statements during the debate to identify the importance of individual statements for the tendency of the participants to change their perception of the winner. In doing so, we found at least 12 statements that were crucial for the prediction of changes in debate winner perception in our models. This means that it actually seems to make a difference what candidates say in the course of a televised debate as political predispositions may be a strong filter but are far away from being the only determinative factors for the perception of the audience.
Another question is whether these statements share common characteristics or whether they diverge between the contenders. Looking at Baerbock’s decisive moments, attacks (V137, V204, V207) on her political opponent Laschet seem to have been a successful way for her to win over viewers. In general, the confrontational relationship between Baerbock and Laschet seems to have been very relevant to the viewers’ perception of the debate winner. First, the important scenes for a decision in favor of Baerbock also include two (counter)attacks by Laschet (V261, V157). This linkage is, second, also visible by the fact that statement V157 is significant for both, Baerbock’s and Laschet’s model. This linkage extends, third, beyond relational strategy in rhetoric, if we consider that the candidate images of both opponents show high relevance in both models. Returning to Laschet’s decisive statements, we see that V42 was also an attack - but this time on the SPD candidate Scholz. It seems that the CDU candidate was trying to prevail on two fronts. The two other statements that were crucial for Laschet can be described as defense (V153) and, in the case of V288, what Schill and Kirk (2014, p. 546) call affirmation of core values. If we look at Scholz’s three most important scenes, we see an inconsistent picture. Baerbock’s attack on both Laschet and Scholz on climate policy stands out and seems to have been important for the decision in favor of Scholz. With a defense combined with an affirmation of core values in V17 and a policy statement on pension policy (V244), no systematic conclusions can be drawn in the sense of the theory of political campaign discourse from Scholz’s pattern. Rather, we see notable diversity among the candidates. For hypothesis 6 we find ample evidence of interaction between RTR and pre-debate data (Fig. 6). Our ML models presented in the main text show large improvements over only RTR and only pre-debate data models presented in the annex section 8.
Although our study provides various insights our findings are restricted by several limitations. First, our RTR study is based on the design of a quasi-experimental field study, which, while providing access to a large and heterogeneous sample, prevents us from entirely excluding relevant omitted variables, even when controlling for relevant variables in our models (e.g. on the role of emotions in debate reception not considered in this study see Waldvogel et al., 2022). Second, our sample strongly deviates from the overall German population. While the size and heterogeneity provide much variance in the data, the generalizability of our findings is limited (Boydstun et al., 2014a). Third, our analysis departs somewhat from the approach of previous studies (Maier and Jansen, 2017; Maier and Renner, 2018) by not systematically coding all statements during the debate and considering them as independent variables in the models. Rather, we use the RTR data and machine learning algorithms to identify relevant statements and then interpret them. Furthermore, the usage of ML techniques comes with methodological limitations explored in our research: to correctly train an algorithm in identifying response patterns that corresponded with change, the sample of participants changing their position should not be too small or the random forest technique might not be able to properly learn and predict this true positive category.
Acknowledging these limitations, we are confident that our analysis provides novel insights to empirical debate research. The fine-grained understanding of what predicts a debate winner is important for general political science research: it gives researchers a better understanding about how political communication is perceived by citizens in relationship to their preconceptions. We also present a new technique that might become systematically applied for electoral campaign optimization in the future. We present the—to our knowledge—first analysis that uses machine learning algorithms to assess the effect of political predispositions vs. candidate statements on changes in debate winner perception, adding nuance to existing literature and expanding its toolbox. In the course of our novel analysis technique, we generated a lot of additional research content and presented it in the appendix and the openly available code files. We encourage further research using the additional data provided.
Data availability
The datasets generated and/or analyzed in the current study are in an anonymized form available at Harvard Dataverse: Ettensperger, Felix, 2023, “Replication Data for: How to convince in a televised debate: the application of machine learning to analyze why viewers changed their winner perception during the 2021 German chancellor discussion”, https://doi.org/10.7910/DVN/CBAKME, Harvard Dataverse, V1 Code and replication instructions are provided with the datasets.
Notes
FDP: 7%, AfD: 1%, Left: 6%, undecided: 11%.
Note that these cases were not used as positive signals in the analysis, because we cannot safely determine if preferences changed during debate.
In the annex, we present a short evaluation of the problem of dependent variable containment and elaborate on the difference between ML and regression-based methods regarding this problem (Appendix Section 4: potential variable contamination).
For more details on the candidate statements V225 and V290 relevant to pathways A and B please see the corresponding appendix section presenting a full overview of the speech phases in their German original transcript and English translation (Appendix A 5—Transcript of candidate statements).
Since Scholz lost over 935 participants swapping to other candidates during the debate, we decided to reverse the RF setting and predicted for his individual case the shift away using the RF algorithm. This is to demonstrate, that high-class imbalance is not a fatal limitation to the RF approach and that we can still make useful adjustments and analysis from this imbalance. But also, to investigate the relationship between all debate moments and the candidate shift in participants further. We present these results in the corresponding annex section (Annex Section 6: RF analysis for participants changing away from Scholz).
References
Aalberg T, Jenssen AT (2007) Do television debates in multiparty systems affect viewers? A quasi-experimental study with first-time voters. Scand Political Stud 30(1):115–35
Alletsee M (2015) Informationsverarbeitung in TV-Duellen. Ein mikrofundierter Mehrebenen-Ansatz zur Analyse der Echtzeit-Reaktionen auf Kandidatenaussagen. Politische Psychol 4:275–291
Bachl M (2013) Die Wirkung des TV-Duells auf die Bewertung der Kandidaten und die Wahlabsicht. In: Bachl M, Brettschneider F, Ottler S (Hrsg) Das TV-Duell in Baden-Württemberg 2011. Inhalte, Wahrnehmungen und Wirkungen. Springer VS, pp. 171–198
Bachl M (2017) How attacks and defenses resonate with viewers’ political attitudes in televised debates. An empirical test of the resonance model of campaign effects. In: Schill D, Kirk R, Jasperson AE (Hrsg) Political communication in real time. Theoretical and applied research approaches. Routledge, pp. 225–248
Benoit WL, Hansen GJ, Verser RM (2003) A meta-analysis of the effects viewing U.S. Presidential Debates. Commun Monogr 70(4):335–350
Benoit WL (2007) Communication in political campaigns. Oxford University Press
Blumenberg JN, Hohmann D, Vollnhals S (2017) And the winner is …?! Die Entstehung des Siegerbildes bei der TV-Debatte 2013. In: Faas T, Maier J, Maier M (Hrsg) Merkel gegen Steinbrück. Analysen zum TV-Duell vor der Bundestagswahl 2013. Springer VS, pp. 59–73
Boydstun AE, Feezell J, Glazier RA, Jurka TP, Pietryka MT (2014a) Colleague crowdsourcing: a method for incentivizing national student engagement and large-N data collection. Political Sci Politics 47(4):829–834
Boydstun AE, Glazier RA, Pietryka MT, Resnik P (2014b) Real-time reactions to a 2012 Presidential Debate. Public Opin Q 78(S1):330–343. https://doi.org/10.1093/poq/nfu007
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Taylor & Francis
Breiman L (1996) Bagging predictors. Mach Learn 26(2):123–40
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brettschneider F (2002) Kanzlerkandidaten im Fernsehen. Häufigkeit, Tendenz, Schwerpunkte. Media Perspekt 6(2):263–276
Boussalis C, Coan T, Holman M, Müller S (2021) Gender, candidate emotional expression, and voter reactions during televised debates. Am Political Sci Rev 115(4):1242–57
Campbell A, Converse PE, Miller WE, Stokes DE (1960) The American voter. University of Chicago Press
Cutler A, Cutler R, Stevens JR (2012) Random forests. In: Zhang C, Ma Y (Hrsg) Ensemble machine learning. Springer, pp. 157–175
De Nooy W, Maier J (2015) When do attacks work? Moderated effects on voter’s candidate evaluation in a televised debate. In: Nai A, Walter AS (Hrsg) New perspectives on negative campaigning. Why attack politics matters. ECPR Press, pp. 285–304
Festinger L (1957) A theory of cognitive dissonance. Stanford
Ho TK (1995) Random decision forests. http://ect.bell-labs.com/who/tkh/publications/papers/odt.pdf
Holbrook TM (1996) Do campaigns matter? Sage
Iyengar S, Jackman S, Hahn K (2016) Polarization in less than thirty seconds: continuous monitoring of voter response to campaign advertising. In: Schill D, Kirk R, Jasperson AE (Hrsg) Political communication in real time, theoretical and applied research approaches, Routledge
James G, Witten D, Hastie T, Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York
Jansen C, Glogger I (2017) Von Schachteln in Schaufenstern, Kreisverkehren und (keiner) PKW-Maut. Kandidatenagenda, -strategien und ihre Effekte. In: Faas T, Maier J, Maier M (Hrsg) Merkel gegen Steinbrück. Analysen zum TV-Duell vor der Bundestagswahl 2013. Springer VS, pp. 31–58
Katz E, Feldman JJ (1962) The debates in the light of research. A survey of surveys. In: Kraus S (Hrsg) The great debates. Kennedy vs. Nixon, 1960. Indiana University Press, Bloomington, pp. 173–223
Knobloch-Westerwick S (2008) Informational utility. In: Donsbach W (Hrsg) The international encyclopedia of communication. John Wiley & Sons, Ltd
König P, Waldvogel T, Wagschal U, Becker B, Feiten L, Weishaupt S (2021) The emotional valence of candidate ratings in televised debates. Communications https://doi.org/10.1515/commun-2020-0059
König P, Waldvogel T (2022) What matters for keeping or losing support in televised debates. Eur J Commun 37(3):312–329. https://doi.org/10.1177/02673231211046706
Lang K, Lang GE (1962) Reaction of viewers. In: Kraus S (Hrsg) The great debates. Kennedy vs. Nixon, 1960. Indiana University Press, pp. 313–330
Lanoue DJ, Schrott PR (1991) The joint press conference. The history, impact, and prospects of American presidential debates. Greenwood
Lindemann K, Stoetzer L (2021) The effect of televised candidate debates on the support for political parties. Electoral Stud 69(1) https://doi.org/10.1016/j.electstud.2020.102243
Lloyd R, Bello A, Rennó L (2020) Preaching to the Choir? Presidential debates and patterns of persuasion in a multiparty Presidential system. Public Opin Q 84(4):892–914
Maier J, Faas T (2003) The affected German voter. Televized debates, follow-up communication and candidate evaluations. Communications 28:383–404
Maier J, Faas T (2011) „Miniature campaigns“ in comparison. The German televised debates, 2002–09. German Politics 20:75–91
Maier J, Faas T, Maier M (2014) Aufgeholt, aber nicht aufgeschlossen. Wahrnehmungen und Wirkungen von TV-Duellen am Beispiel von Angela Merkel und Peer Steinbrück 2013. Z Parlam 45:38–56
Maier J, Faas T (2015) The impact of personality on viewers’ reactions on negative candidate statements in televised debates. Polit Psychol 4(2):5–23
Maier J, Hampe JF, Jahn N (2016) Breaking out of the lab. Measuring real-time responses to televised political content in real-world settings. Public Opin Q 80(2):542–553
Maier J, Jansen C (2017) When do candidates attack in election campaigns? Exploring the determinants of negative candidate messages in German televised debates. Party Politics 23:549–559
Maier J, Renner A-M (2018) When a man meets a woman. Comparing the use of negativity of male candidates in single- and mixed-gender televised debates. Political Commun 35:433–449
Maier J, Maier M, Faas T (2022) Do televised debates affect voting behavior? In: Schmitt-Beck R et al (eds) The changing German voter. Oxford University Press
Maier M, Strömbäck J (2010) Advantages and limitations of comparing audience responses to televised debates. A Comparative Study of Germany and Sweden. In: Maier J, Maier M, Maurer M, Reinemann C, Meyer V (Hrsg) Real-Time Response Measurement in the Social Sciences. Methodological Perspectives and Applications, Peter Lang
Maurer M (2016) Nonverbal influence during televised debates. Integrating CRM in experimental channel studies. Am Behav Sci 60:1799–1815
Mazara J (2013) Irony in the face(s) of politeness—strategic use of verbal irony in Czech Political TV debates. In: Thielemann N and Kosta P. (eds) Approaches to Slavic interaction. John Benjamins, Amsterdam
McKinney MS, Warner BR (2013) Do presidential debates matter? Examining a decade of campaign debate effects. Argum Advocacy 49:238–258
Mullinix KJ (2015) Presidential debates, partisan motivations, and political interest. Pres Stud Q 45(2):270–288
Nagel F (2012) Die Wirkung verbaler und nonverbaler Kommunikation in TV-Duellen. Eine Untersuchung am Beispiel von Gerhard Schröder und Angela Merkel. VS Verlag
Nagel F, Maurer M, Reinemann C (2012) Is there a visual dominance in political communication? How verbal, visual, and vocal communication shape viewers’ impressions of political candidates. J Commun 62:833–850
Otto L, Maier M, Glogger I (2015) Image- or issue-orientation? How the presentation modality influences the perception of candidates in televised debates. Polit Psychol 4:215–234
Pattie C, Johnston R (2011) A tale of sound and fury, signifying something? The impact of the leaders’ debates in the 2010 UK general election. J Elections Public Opin Parties 21(2):147–177. https://doi.org/10.1080/17457289.2011.562609
Redlawsk DP, Civettini AJ, Emmerson KM (2010) The affective tipping point: do motivated reasoners ever “get it”? Political Psychol 31(4):563–593
Reinemann C, Maurer M (2005) Unifying or polarizing? Short-term effects and postdebate consequences of different rhetorical strategies in televised debates. J Commun 55(4):775–94
Sears DO, Chaffee SH (1979) Uses and effects of the 1976 debates. An overview of empirical studies. In: Kraus S (Hrsg) The great debates. Carter vs. Ford, 1976. Indiana University Press, pp. 223–261
Schill D, Kirk R (2014) Courting the swing voters. „Real time“ insights into the 2008 and 2012 U.S. presidential debates. Am Behav Sci 58(4):536–555
Schrott PR, Lanoue DJ (2008) Debates are for losers. Political Sci Politics 41(3):513–518
Schrott PR, Lanoue DJ (2013) The power and limitations of televised presidential debates. Assessing the real impact of candidate performance on public opinion and vote choice. Electoral Stud 32:684–692
Spieker A (2011) Licht ins Dunkel der TV-Duelle. Rhetorische Strategien und ihre Wirkungen im TV-Duell 2009. Eine empirische Analyse mittels real-time-response measurement. In: Haschke JF, Moser A (Hrsg) Politik-Deutsch, Deutsch-Politik. Aktuelle Trends und Fachergebnisse. Frank & Timme, pp. 75–93
Therneau T, Atkinson B (2019) Package ‘Rpart’—recursive partitioning and regression trees. CRAN. R Package
Waldvogel T (2020) Applying virtualized real-time response measurement on TV-discussions with multi-person panels. Statistics, Politics and Policy 11(1):23–58
Waldvogel T, Metz T (2020) Measuring real-time response in real-life settings. Int J Public Opin Res 32(4):659–675
Waldvogel T, König P, Wagschal U, Becker B, Weishaupt S (2022) It’s the emotion, stupid! Emotional responses to televised debates and their impact on voting intention. Open. Political Sci 5(1):13–28
Waldvogel T, König P-D, Wagschal U (2023) All I do is win, no matter what? What matters in gaining electoral support from televised debates. Commun Soc 36(1):127–149. https://doi.org/10.15581/003.36.1.127-149
Warner BR, McKinney MS (2013) To unite and divide: the polarizing effect of presidential debates. Commun Stud 64:508–527. https://doi.org/10.1080/10510974.2013.832341
Warner BR, McKinney MS, Bramlett JC, Jennings FJ, Funk ME (2019) Reconsidering partisanship as a constraint on the persuasive effects of debates. Commun Monogr https://doi.org/10.1080/03637751.2019.1641731
Westen D (2008) The political brain the role of emotion in deciding the fate of the nation. Public Affairs
Yawn M, Ellsworth K, Beatty B, Kahn KF (1998) How a Presidential primary debate changed attitudes of audience members. Political Behav 20(2):155–81
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The Political Department of the Albert-Ludwigs-University Freiburg approved the conducted research on July 13, 2016. The study was conducted adhering to the principles of the WMA Helsinki Declaration. All necessary measures were taken to protect the rights, privacy, and dignity of study participants. The supervising authority is the Ministry for Science, Research and Art in Baden-Wurttemberg (Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg, Königstraße 46, 70173 Stuttgart).
Informed consent
Informed consent has been obtained from all individuals participating in this study prior to the collection of data. All participants have been informed via the real-time response app about the purpose of the study, the data collection process, the intended usage of their data and the general scientific interest of the research team. All individuals were asked to confirm that they were willing to accept the usage of their anonymous and non-personalized data for the purpose of the presented study.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ettensperger, F., Waldvogel, T., Wagschal, U. et al. How to convince in a televised debate: the application of machine learning to analyze why viewers changed their winner perception during the 2021 German chancellor discussion. Humanit Soc Sci Commun 10, 546 (2023). https://doi.org/10.1057/s41599-023-02047-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1057/s41599-023-02047-5













