Abstract
Based on the Visual Grammar (VG) of Kress and van Leeuwen, the sound analysis of Machin, the context model of Wodak and Meyer, multimodal frameworks of Zhang and Bateman et al., this paper proposes a method for Multimodal Critical Discourse Analysis (MCDA) to study the diachronic construction of the interviewees’ images in the BBC documentary series Seven Up. Conducting a qualitative method supplemented with a quantitative one, this study finds that the interviewees’ images are constructed diachronically mainly through the changes in representative meaning, interactive meaning, and compositional meaning of the visual pictures. Besides, there is a complementary relation, either reinforcing or non-reinforcing, among the visual mode, the verbal mode, and the audial mode. Each interviewee ends up projecting a totally different image compared with their initial portrayal. This study also discusses how certain modes contribute to revealing the changes in the ideology of BBC and the changing socio-political reality of UK.
Similar content being viewed by others
Introduction
Considered as a pattern of real-life social practice, discourse is shaped not only by written texts and oral speeches but also by various social semiotics, such as images, gestures, and sounds. The rapid advancement of media technology has diversified the ways to deliver information and construct meaning, providing a variety of multimodal discourses bearing rich communicative meanings and multiple ideologies. Documentary, a distinct film genre based on reality, speaks about specific events and situations involving “social actors” who present their lives, attitudes, and values to viewers in a direct and natural way (Nichols, 2010: 14). As the real people, places, events, and backgrounds presented in documentaries can be regarded as a portrayal and representation of real life, documentary has increasingly been a research focus of far-reaching social value. There exist some documentary studies on the image construction of a specific group, for instance, Garrigan (2017) attempts to figure out the composite identities of Jewish Mexicans by selecting several dialogues from Mexican documentary films for discussion. Through exploring the images of Jewish Mexicans constructed in the documentaries, the study further reveals a social fact that Jewishness and Mexicanness can be simultaneously bounded and interconnected.
Seven Up is a BBC documentary series filmed every seven years, which records the growth and changes of 14 interviewees from different social classes in Britain from the age of 7 to 63. The first episode was screened in 1964 and the latest one was released in 2019. The series consists of 9 films (15 installments) in total. Moran (2002: 389) considers that the documentary “points to the significance of class implicitly”, for the different life choices of the 14 interviewees are more or less affected by the class of birth. The series not only depicts the development of these children but also manifests the transition of British society over the past 56 years, thus it bears great research significance. Previous studies of Seven Up mainly focus on ethnography and film narrative (Thorne, 2009), children and performativity (Bruzzi, 2018), and so on, but are rarely conducted from a linguistic perspective. Moreover, instead of discussing the dynamic changes of the interviewees’ images throughout the whole documentary series, these studies mostly take one certain film or the first few films of the series as the subject, and the sociopolitical reality of Britain behind such diachronic image construction has been ignored to some extent.
Therefore, this paper attempts to conduct a multimodal critical discourse analysis of the diachronic construction of the interviewees’ images in the documentary Seven Up, investigating the changing socio-political reality of the UK embedded in such construction. To this end, a qualitative-centered method supplemented with a quantitative one is adopted to help expound on the visual meanings in the selected screenshots of the interviewees and their surroundings. Besides, the complementary relation among the different modes is also discussed. This paper first makes a brief review of relevant studies on MCDA in “Literature review”. It then elaborates the theoretical framework for analyzing the diachronic image construction in the documentary Seven Up in “Theoretical framework”. Next come “Research corpus and methodology”, “Analysis and results”, and “Discussion”. Finally, in “Conclusion”, future implications are outlined.
Literature review
As “a Social Semiotic approach to visual communication” (Kenalemang, 2022: 2126), MCDA aims to examine “how discourses and ideologies are carried by both language and other forms of communication such as images, films, and so on, and how these functions serve specific political interests and maintain kinds of social relations” (Li and Li, 2022: 171). Since 1990s, some researchers in Critical Discourse Analysis (CDA), like Fairclough and Wodak (1997), have been discussing the feasibility of applying theories of CDA to analyze discourses containing diverse semiotic modes. During the past decade, several newer MCDA analytical frameworks have also been proposed and applied to the specific analysis of multimodal discourses, such as Chen and Machin’s (2014) combining the interviewing method with a variety of theories put forward by Fairclough, van Dijk, Kress and van Leeuwen to analyze the text, color, typeface and composition in the Chinese women magazine Rayli.
Beyond the separate analysis of text, image, sound, or sense, the interaction among different modes in a multimodal discourse is also taken into consideration. Zhang (2009) elaborates that there exist two kinds of synergistic effects among the various forms of a discourse, complementary relation and non-complementary relation. Complementary relation refers to the requirement of another mode to assist in achieving the target meaning when one mode fails to express the whole, and it includes two sub-relations, reinforcing relation and non-reinforcing relation. Reinforcing relation means only one mode is the principal form of communication and the others serve as the intensification, while non-reinforcing relation stresses the necessity of each mode in the construction of meaning. On the contrary, non-complementary relation means the second mode plays no role in the expression of the meaning of the first mode.
The synthetic effect of complementary modes in clearly conveying meaning and expressing emotions is the main reason why people use diverse modes in communication. This framework has been applied to some multimodal analyses, for example, Yuan and Nai (2022) explore the synergy between verbal mode and visual mode in the BBC documentary Chinese New Year. The study finds that the two modes both show China’s emphasis on historical inheritance as well as innovation, and that their effects are mutually reinforcing. Some researchers also propose other analytical frameworks for verbo-visual interaction in multimodal discourses, such as Bednarek and Caple’s (2012) analyzing the discursive construal of news values across written language and images in the online reporting of the 2011 Queensland floods on the website of The Sydney Morning Herald.
Moreover, many analysts have also paid particular attention to exploring the hidden ideology and social reality within the discourses so as to enlarge the research scope. O’ Halloran and Smith (2011: 109) point that ideology and manipulation can be embodied in digital semiotic resources other than language and that the diversity of data can make the critical analysis more objective and comprehensive. So far, there have been some studies conducted to discuss issues as globalization and localization (Chen and Machin, 2014), the educational reform movement in America (Catalano and Gatti, 2017), and U.S.-Mexico border crisis (Li and Li, 2022), with few concerning the social class.
As to the type of discourse, in addition to the First World War monuments (Abousnnouga and Machin, 2013), breast cancer websites (Gibson et al., 2015), news reports on a teaching scandal (Catalano and Gatti, 2017) and cosmetic advertisements (Kenalemang, 2022), the recent five years have also witnessed the emergence of a few studies on the critical analysis of dynamic multimodal discourses, such as Teo’s (2021) ideological research on a series of Singapore’s teacher recruitment videos. Despite a relatively small number, there are also some multimodal analyses of documentaries. For instance, Cui and Zheng (2023) take the BBC natural documentary Planet Earth II as the corpus, analyzing how text, picture, and sound jointly construct the representative meaning, interactive meaning, and compositional meaning from the perspective of VG. As to the research question, current multimodal analyses on documentaries mostly concentrate on national image construction, such as Wei’s (2023) example analysis of the construction and communication of the national image in the documentary China on the Move from the cultural, contextual, and content dimensions.
To sum up, although the existing MCDA studies have provided important implications, there is still room for improvement. First, academic work is still inadequate on the combination of CDA and MDA theories to analyze verbal, visual, and audio modes of a documentary discourse. Second, current multimodal analyses of documentaries rarely focus on the growth of children, especially how social class can influence an individual’s life choice, which may help people understand the close relationship between individuals and society. To fill the gaps, we propose a MCDA analytical method based on VG of Kress and van Leeuwen, the sound analysis of Machin, the context model of Wodak and Meyer, the multimodal frameworks of Zhang and Bateman et al. to analyze the verbal language, visual images, audial sounds as well as the complementary relation among the diverse modes in the documentary Seven Up, studying the diachronic construction of the interviewees’ images and further exploring the hidden ideology and socio-political reality.
Theoretical framework
As meanings can be mapped across different semiotic modes, visual structures can also “point to particular interpretations of experience and forms of social interaction”, just like linguistic structures (Kress and Van Leeuwen, 2006: 2). In Halliday’s view, every semiotic system fulfills an ideational function, an interpersonal function, and a textual function. The ideational function refers to representing the world inside and around us, the interpersonal function is to enact social interactions and relations, and the textual function is defined as the attempt to cohere all the elements of message entities and relevant environment (Kress and Van Leeuwen, 2006: 15). Corresponding to the three meta-functions of language, Kress and Van Leeuwen (2006) assume that there also exist three meanings embedded in communicative visual images, which are described as representational, interactive and compositional. The three visual meanings can be respectively embodied as narrative (with a vector) or conceptual representation (without a vector), constitution and maintenance of the interaction between the image producer and viewers, and the way in which the representative and interactive elements are integrated into a meaningful whole.
As to the analysis of the audial mode within the documentary Seven Up, this paper refers to Machin’s (2014) multimodal approach to film music. Apart from verbal and visual semiotic modes, sound and music can also be used “to indicate emotions and provide insights into the inner world of actors, to indicate kinds of characters, and to create continuity and links between scenes” (Machin, 2014: 300). A sound analysis can be conducted from several aspects, including pitch, pitch direction, pitch range, sound qualities, etc. Discussing how sound is presented contributes to revealing the meaning potentials hidden in the multimodal discourse.
In discourse analysis practice, it is equally significant to ponder over the contextual elements, for there exists a dialectical relationship between discourse and society, which means discourse is not only shaped by situational, institutional, and social structures, but also has a considerable effect on both discursive and non-discursive social processes and actions, specifically ideology, political belief, value judgment and power relation (Wodak and Meyer, 2009: 66). To guarantee the validity and comprehensiveness of the analysis of such dialectical relationship, Wodak and Meyer (2009: 67) demonstrate a triangulatory approach based on a model of context which embraces 4 levels, including immediate language or text-internal co-discourse, intertextuality, extralinguistic social or institutional variables of a specific “context of situation”, and the broader sociopolitical and historical context. For a multimodal discourse, all the modes, whether verbal (lexicon, grammar, syntax), visual (gestures, images, face expressions), audial (intonation, musical notes), can be involved in the first level of context, for all semiotic systems are some kind of language itself. In the studied documentary series Seven Up, the interviewees sometimes refer to their older utterances from the previous episodes, which can be considered as the second level of context, intertextuality. As to the third level, the extralinguistic, situational elements, including gender, age, occupation, education, family, social class, etc. of the interviewees, should be discussed. The fourth level concerns socio-political transformations in Britain or the general socio-political situation in the world at a certain time. As the latter two levels are generalized as “synchronic and diachronic dimensions” (KhosraviNik and Zia, 2014: 765), a contextual analysis can show how these situational elements and historical changes influence the life decisions of the interviewees and thus the changes in their images in the series.
In addition to the separate analysis of different modes, it is equally important to further study the synergy among these modes. Apart from Zhang’s (2009) framework mentioned in “Literature review“, this paper also refers to Bateman et al.’s (2017: 327–339) analysis of films and moving (audio-)visual images, in which they point that “film manipulates not only a rich variety of visual cues (themselves ranging over naturalistic images, animation, written language) but also an almost similarly rich variety of audial cues (ranging over sound, music, and spoken language) in an integrated fashion that makes it so powerful”. In order to “fix” an ongoing dynamic experience of a film, researchers can select representative frames and present these in sequence visually. The frames may represent individual shots or particular events or technical framings within shots as necessary for the discussion. Besides, the sound mode that dynamically changes with the images and their synergy in conveying meanings are also studied.
Research corpus and methodology
Research questions
This paper seeks to study the diachronic construction of the images of the interviewees from different social classes in the documentary series Seven Up, and to dig out the ideologies that are disguised under certain construction of the interviewees. A total of 9 films (15 episodes) in the series are watched and analyzed in order to answer the following questions:
-
(1)
How are the interviewees’ images constructed diachronically in the documentary Seven Up?
-
(2)
What changes in these interviewees’ images are constructed?
-
(3)
Why are these interviewees’ images constructed diachronically?
Method and data
This paper mainly employs qualitative research methods, especially case study, to conduct an analysis of the changes in the interviewees’ images in the documentary Seven Up. Meanwhile, quantitative analysis is also conducted as an auxiliary method to ensure the credibility of the research. The documentary series can be accessed on the Bilibili website (https://www.bilibili.com/). As the documentary Seven Up involves 14 interviewees from different social classes in Britain, it is rational to make a classification of them according to their social class of birth, and to observe and find out the common and different features of interviewees of the same class. Combining the background information given in the documentary with our own understanding, we classify the interviewees into three groups: upper class, middle class, and working class (see Table 1). Considering the accuracy and workload of the study, we randomly select 2 interviewees from each social class as the focus: John and Suzy from the upper class, Bruce and Neil from the middle class, Nick and Jackie from the working class.
Afterward, we employ PotPlayer, a video player that can randomly locate time scenes in frames, as the tool to collect multimodal data for analysis. Jewitt (2005: 316) claims that screen-based discourses are complex multimodal ensembles of language, sound, image, animated movement, and other modes of communication. As a kind of screen-based discourses, the documentary Seven Up also encompasses various semiotic systems. This paper mainly studies verbal mode (utterances of the focus interviewees extracted from the subtitles), visual mode (screenshots of the focus interviewees as well as other elements related to them), audial mode (pitch and sound qualities of the focus interviewees), and the complementary relation among these modes. It should be noted that we do not focus on the background music because the documentary is composed of interview clips, and the audial mode is mainly formed by the voices of the participants.
In order to ensure the objectivity of the study and the dependability of the results, we take PotPlayer to randomly collect 5 excerpts from the interviews with each of the 6 focus interviewees. The selection of the total 30 excerpts should be based on the following principles: (1) the selected excerpts should contain the focus interviewees’ personal utterances, or their dialogues with the interviewer or other interviewees, and corresponding visual images; (2) the selected excerpts are required to cover the adolescence, youth, middle and old age of each focus interviewee; (3) since the total length of each interview with each interviewee is 6–10 min, the selected excepts should be no more than 1 min.
Then from each excerpt, we extract the subtitles and randomly capture a screenshot, and a total of 30 screenshots with corresponding subtitles are obtained as the data. For the captured screenshots, we first make classification of them according to the subcategories of each of the three visual meanings defined by VG, based on qualitative analysis. Then the classification results are input into Excel, one of the components of Microsoft Office software that can carry out various data processing, information management, and statistic analysis. With the filtering function of Excel, the number and percent of screenshots in each subcategory are statistically counted, and the statistical tables are generated automatically. Furthermore, conducting an analysis of the captured screenshots and corresponding subtitles, together with the pitch and sound qualities of the focus interviewees, we discuss the complementary relation among the three modes in the diachronic construction of the interviewees’ images.
Analysis and results
Representational meaning
Based on a qualitative classification of the captured 30 screenshots, we enter the results into Excel to calculate the number and percent of screenshots in each sub-category of the three visual meanings, which are shown in Table 2.
The representational meaning of a visual image can be realized through either a narrative process or conceptual process, and the distinguishing mark is whether there exists a “vector”, which can be represented as an action, eyesight, thought bubble or speech bubble from one participant to another (Kress and Van Leeuwen, 2006: 59). By contrast, with no connection among participants in an image, conceptual representation can be regarded as a reflection of things and phenomenon. As indicated in Table 2, the representational meaning of the selected data is mainly reflected through the narrative process (93.33%), which means that the documentary chiefly shapes meaning through images with rich vectors, rather than images simply presenting the information.
As there are only two images that show conceptual representation among the captured screenshots, we choose both of them to conduct a detailed analysis. It is not hard to find a similarity between the two images in Table 3: without a vector, they both simply present where a middle-class interviewee lives after he leaves his original family and starts an independent life. Unlike Bruce, who lives in a traditional house built in a community, Neil lives in the Western Highlands of Scotland temporarily, for he drifts from place to place and has no stable accommodation in youth. The comparison of Bruce’s and Neil’s residences indicates that the two interviewees from the same social class have experienced different life trajectories.
From Neil’s words at different stages of life, an obvious change in his image can be found. At the age of 28, Neil separates himself from the public in many aspects, including hobby, lifestyle, dwelling, etc. In contrast, he chooses to move to London and even lives with Bruce for a while in middle age, which shows that he is no longer as solitary and unsociable as in youth. Besides, as shown in the verbal language of Image 2, Neil refers to his residential experience in Shetland and this kind of intertextuality makes the two interviews connect closer, thus highlighting the change in his image. Meanwhile, Bruce’s kind and sympathetic image is also indirectly constructed when Neil praises him as a “model host”.
There also exists a complementary relation, exactly reinforcing relation, between the visual mode and the audial-verbal mode, for the change in the pitch of Neil’s voice in the two interviews indicates his different mental state and life attitude. Pitch is used to describe “how high or low a sound is”. A higher pitch can imply energy, while a lower pitch can mean drooping despair (Machin, 2014: 301). In 28 Up, the pitch of Neil’s voice is generally low, which to some extent reflects his anxiety and self-imposed isolation. By contrast, in the latter interview, he speaks with a higher pitch and slight laughter. Machin (2014: 302) points that “a movement from low to high pitch can give a sense of a picking up of spirits”. Such a pitch movement conveys Neil’s open and positive attitude towards life. All these reinforce Neil’s changing image constructed in the visual images.
We also choose two images for a detailed analysis of narrative representation, as shown in Table 4. The two images are both captured from the interviews with Suzy at different ages. In Image 1, there are several vectors, including Suzy’s slightly opening her mouth as she tells her story, looking up at the interviewer, and holding a lit cigarette, etc. Suzy is smartly dressed, with a necklace around her neck, several rings on her right hand, a delicate bag on her lap, and a hint of melancholy on her face. The caption below describes the fact that she has dropped out of school. In the verbal language, Suzy talks about the pressure her parents put on her and refers to her reluctance to the interview at the age of 14. Her voice sounds flat and tired, mixed with obvious tension. Tension refers to “the extent to which the voices speak or sing with an open or closed throat”, and people tend to close up their throats when they become tense in everyday situations (Machin, 2014: 310). Such a sound quality reinforces the presentation of Suzy’s anxious and pessimistic image at the age of 21.
Image 2 is presented mainly as a reaction process with a vector formed by the eyeline of the interviewees. In a reaction process, the participant who does the looking is called the Reactor, while the participant or the visual proposition at whom or which the Reactor is looking is called the Phenomena (Kress and Van Leeuwen, 2006: 67–68). The Reactor in Image 2 is Suzy, who is looking at the Phenomena, her husband. This screenshot is captured from the interview with Suzy at the age of 28 when she is married and has a lovely baby. It can be clearly seen that Suzy is tenderly smiling and looking at her husband when the interviewer asks why she is not as anxious and pessimistic as before. The caption below also introduces Suzy’s husband, echoing his appearance in the image. At this time, Suzy speaks with a more open and relaxed throat, which is different from her low and tense voice seven years ago. From these signs, a change in her image from upset to cheerful can be perceived.
Interactive meaning
The interactive meaning of a visual image can be realized by “contact”, or gaze, which can constitute an image act as “demand” or “offer”, and the distinguishing mark is whether the participants are making a direct eye contact with the viewers (Kress and Van Leeuwen, 2006: 117–119). According to Table 2, the interactive meaning of the selected data is mainly reflected through constituting an offer act (73.33%) with the viewers, which means that the interviewees have less direct eye contact with the viewers, in other words, most shots are taken from the side, helping present the interviewees’ gestures and expressions objectively and completely.
This paper chooses several examples to analyze how the two image acts are reflected in the captured screenshots, as shown in Table 5. The four images are all about the interviewees from the working class, with the first two concerning Jackie and the latter two concerning Nick. Image 1 and Image 3 serve as the examples of offer act, which clearly present the interviewees’ avoidance of the camera, implying an unwillingness to make eye contact with the interviewer or the viewers.
In Image 1, Jackie is lowering her head, one hand scratching the back of her neck, indicating that she is anxious and reluctant. This screenshot is captured when the interviewer asks her whether she thinks she gets married and settles down too young, which implies that after a girl gets married, she will have a fixed lifestyle. To express her anger and defense of self-dignity, Jackie hits back at the interviewer and then lowers her head to express resistance. The pitch of her tone changes dramatically, rising sharply as she asks the interviewer the question “what do you mean by settle down”. Meanwhile, compared with the other two working-class girls, Jackie replies with a louder voice, which can mean “weight and importance” (Machin, 2014: 311). All these sound qualities highlight her dissatisfaction with the interviewer’s stereotype of women. Actually, Jackie’s reaction to some extent reflects the influence of the British women’s movement of the 1970s on a large group of women pursuing independence.
By contrast, Image 2 shows Jackie’s looking at the camera gently while talking about her marriage and family. Such a demand act indicates a transition in Jackie’s attitude towards the interviewer. The verbal language directly claims her optimistic thoughts about life, and the way she leans back with her head slightly raised also implies her relaxed and cheerful state of mind. Moreover, the photo of her two sons on the table behind reveals one of the sources of her happiness. At 42, Jackie’s voice becomes softer with a relatively moderate pitch and loudness, which implies a closer psychological distance from the interviewer, for softness can suggest “intimacy and confidentiality” (Machin, 2014: 311). Besides, when she says negative words like “not” and “nothing”, her tone takes on a significant weight, conveying an emphasis on the positive and satisfactory life attitude. All these modes reinforce the construction of Jackie’s image as optimistic and contented in middle age.
In the latter two images, the producer intentionally gives Nick a close-up of his face. Nick is born in a rural village with kind but earthy residents and poor educational conditions. In 14 Up, most of the time he avoids the camera by looking down or sideways so that a shy and nervous image in adolescence is constructed. By contrast, in Image 4, Nick is no longer afraid of the camera and even smiles naturally. Meanwhile, his voice becomes louder as well. Actually, he has successfully entered the physics department of Oxford University. In verbal language, Nick describes his attempts to gain confidence and challenge himself in the past seven years. Through relentless hard work, he manages to go outside and enter Oxford University, a turning point in his life, at which a change in his image from shy to confident is thus constructed.
Compositional meaning
The compositional meaning of the selected visual data is mainly realized through three types of placement of the compositional elements on an image: left-right, center-margin, and top-down, which contribute to different information values. If the compositional elements are placed left and right, then the information presented on the image changes from familiar to unfamiliar from left to right. As to the second type, elements at the top convey ideal information while those at the bottom show real information. Besides, if the importance of a certain element is expected to be perceived by the viewers, it should be put at the center (Kress and Van Leeuwen, 2006: 195). As Table 2 shows, the center-margin layout has the highest proportion (66.66%), followed by the left-right layout (26.67%), which means that despite the use of all three layouts, the interviewees are mostly placed at the center so that the construction of their images can be strengthened.
In order to explore how the three layouts contribute to the diachronic construction of the interviewees’ images, four examples are chosen for detailed analysis, as shown in Table 6. The first two images both concern the middle-class interviewee, Bruce. In Image 1, Bruce is placed at the center of the whole picture so that his facial expression can be clearly observed. With his head down and his eyebrows furrowed, Bruce is talking about his experience that he spent last summer term alone due to his weakness in taking responsibility in college group activities. A sense of loss can be perceived from his expression and low voice. These visual and audial elements play a reinforcing role in conveying the meaning of the verbal language, thus constructing Bruce’s irresponsible and confused image at the age of 21.
Unlike Image 1, Image 2 presents a top-down layout. At the top, Bruce is half bent over, patiently explaining the math problems to the students; at the bottom, all the students are listening carefully. This layout indicates that Bruce finally finds his ideal career that can help realize his personal value and benefit the society. Moreover, it also shows the social reality that the students who live in the East End of London are in desperate need of teaching resources, which deserves attention. Such a change in the layout helps construct Bruce’s responsible and sympathetic image at the age of 28, different from his irresponsible image as a college student 7 years ago.
Image 3 shows the three upper-class interviewees’ discussion on the strikes of workers in the 1970s in Britain. This image is also an example of a center-margin layout, and here we mainly focus on John (the boy on the left). John is turning his head sideways to make his point to the other two interviewees, his right hand waving. In the verbal language, John first uses the third person pronoun “they” to refer to those workers organizing strikes to demand higher pay, pointing out that such behaviors may do harm to the economic development of the country. Actually, the discussion about strikes and money is triggered under the socio-political background of Britain with the growing power of labor unions at the time. In the 1960s and 1970s, reports about workers’ strikes and the value of money were often found in mainstream newspapers, such as The Times. Therefore, to some extent, the personal views expressed by the interviewees on these topics in the documentary also establish an interdiscursive connection with the discourses in other genres.
Despite the concerns that a complete denial of workers’ right to strike can have an impact on democracy, John is very firm in his belief that policies should be formulated to ban strikes, and when the other two interviewees raise objections, he contradicts them in a direct way. When Charles proposes his objection, John uses an analogy that putting criminals in prison is also depriving them of the freedom to kill and steal, which implies that John regards the workers as equal to criminals to a certain extent. Meanwhile, he accentuates the verbs like “strike”, “killing” and “stealing”. All these help construct John’s arrogant and contemptuous image as an adolescent.
On the contrary, Image 4 presents a left-right placement of John and a fruit pedlar. At this time, John is doing charity for a rehabilitation center for disabled children. On the left, John is smiling at the peddler in front of him, with a high, thick hat on the head; on the right, the pedlar faces away from the camera. Such a layout suggests a transition from familiar information to something unknown, which can be interpreted in two aspects: first, John is placed on the left, for the audience who have watched the series all know him, while the pedlar is a stranger; second, there is some kind of change in John’s attitude towards the working class and those in need of help. Through the transition from given information to new information, a change in John’s image from arrogant to courteous is constructed. However, the caption below gives a seemingly unrelated information that John has always wanted to enter politics. In this example, John’s determination to help people is supplemented and strengthened by his words. Despite his ambition and long-cherished wish to become a statesman, he still chooses to devote his efforts to charity, giving care and love to people in need, which highlights his kind-hearted and selfless image in middle age. The multiple modes are in a non-reinforcing complementary relation, for they are all indispensable to the positive construction of John’s image.
Discussion
Based on the multimodal analysis above, we generalize the images of the six focus interviewees in adolescence, youth, middle, and old age in Table 7. It is clear that the images of almost all focus interviewees have undergone a spiral escalation trend of development. The findings shown in the last section match those observed in earlier studies on the documentary series Seven Up that social class plays a significant role in one’s life, and that the personal development of different interviewees reflects changing attitudes (Moran, 2002). However, it must be pointed that this paper has proposed an analytical approach to documentary discourse from the perspective of MCDA, comprehensively discussing how verbal, visual and audial modes contribute to the diachronic construction of the interviewees’ images. Additionally, contrary to previous MCDA studies on documentary discourse, like Yuan and Nai’s (2022) analysis of the BBC documentary Chinese New Year, this paper integrates Zhang’s (2009) multimodal framework with Wodak and Meyer’s (2009) context model to further explore the contexts of immediate language, intertextuality, extralinguistic variables, and British socio-political reality embedded in the complementary relation among diverse modes, broadening the research dimension and increasing the research depth. Moreover, this study focuses on the dynamic changes in the interviewees’ images as well as the changing ideologies reflected in certain modes, which may provide a reference for future researches on biographical discourses.
Actually, despite the fact that the documentary tries to follow the life trajectory of the interviewees with realistic filming techniques, not adding many personal comments or sensational background music, we can still get a glimpse of the political concerns and the changes in ideologies of BBC, one of the main British broadcasters, from the synergy among certain modes in the diachronic construction of the interviewees’ images.
Since the mid-20th century, there has been a popular sociological view that the social class system in Britain has become fixed. According to the producers, the original intention of filming Seven Up is to test whether this view still holds in the 21st century. In the early films of the documentary series, it can be found that the interviewees seem to be “categorized” according to their class, for example, the working-class girls and the upper-class boys always appear separately in the same scene (such as Image 1 in Table 5 and Image 3 in Table 6). Additionally, on the questions of hobbies, schools, career plans, and even views on the workers’ strike, the interview excerpts of interviewees from different social classes are also edited together.
Through the completely different verbal replies of the interviewees, the audience can clearly perceive the distinction in choice and cognition among the interviewees from different social classes. Meanwhile, the cooperation of visual and audial modes can better highlight the distinct image characteristics of these interviewees, as analyzed in the last section. Such a presentation of the visual images appears to reflect BBC’s long-held impartial and objective attitude towards political and economic issues but in fact hints at its underlying tendency to believe that the class divide is hard to bridge and can even affect one’s education level and marital life. In the early days of filming the documentary, this ideology about class solidification was intensified under the socio-political background of post-war Britain, which was experiencing the power expansion of labor unions, the acceleration of aging, and the rapid development of industry.
The changing visual meanings of the analyzed images also mirror a gradual shift of BBC’s filming intention. In the subsequent films of the series, it is not hard to find that the producers tend to include more situational elements as well as new characters, like Suzy’s husband (Image 2 in Table 4), Bruce’s students (Image 2 in Table 6), the pedlar John speaks to (Image 4 in Table 6), etc. Though the center-margin layout is still dominant, more presentation of the other two layouts contributes to conveying some new information. Besides, unlike earlier films, the camera starts to focus more on one individual interviewee at a time. All these indicate that the producers gradually pay more attention to life itself, instead of how social class influences one’s development.
Considering the socio-political transformations in Britain, the Thatcher revolution of the 1980s is regarded as shifting wealth and power away from organized labor towards an ever-expanding middle class of individual consumers and shareholders (Moran, 2002: 391). In the mean time, Neoliberalism, a revived form of economic liberalism, has been playing an increasingly important role in international economic policy since the 1970s. Such reforms have brought opportunities to many individual strivers like Nick, promoting economic growth. There is no denying that middle- and upper-class interviewees are easier to obtain higher levels of education, largely thanks to the class they are born into. In contrast, the interviewees from the working class generally only complete secondary school education or even drop out of school early, which imposes a great impact on their future jobs. However, Nick’s success in making the leap from working class to middle class also reveals that social class is not an absolute factor and that people have the possibility to make a difference through arduous efforts and sound education.
There are still some areas for improvement in future research. Due to space limitations, this paper lacks a detailed analysis of all the 14 interviewees in Seven Up. Additionally, in terms of the second level of context, this paper mainly focuses on the intertextual dimension of the interviewees’ utterances. Searching for more relevant discourses from other genres and discussing the interdiscursive connections can help improve the analysis. Furthermore, it is recommended to increase quantitative means to assist the qualitative analysis, which can further reduce the error caused by human calculation and make the research results more convincing.
Conclusion
To sum up, based on the VG of Kress and van Leeuwen, the sound analysis of Machin, the context model of Wodak and Meyer, multimodal frameworks of Zhang and Bateman et al., we have tried proposing and applying an MCDA method to study how and what the changes in the interviewees’ images in the documentary Seven Up are constructed. Moreover, the hidden ideologies of BBC and the changing social-political reality of Britain reflected in such diachronic construction are also explored. The findings are as follows.
First, the interviewees’ images are constructed diachronically mainly through changes in representative meaning, interactive meaning, and compositional meaning of the visual pictures. Besides, there is a complementary relation, either reinforcing or non-reinforcing, among the visual mode, the verbal mode, and the audial mode.
Second, on the whole, the interviewees’ images constructed in Seven Up present a spiral escalation trend of development. Each focus interviewee ends up projecting a totally different image compared to their initial portrayal.
Third, the changes in how certain modes are presented as well as the diachronic construction of the interviewees’ images both reflect BBC’s shifting focus of filming from expressing class solidification to concerning life itself. Meanwhile, the changing socio-political reality of Britain has also injected some liquidity into the social class system.
In a word, conducting an analysis on how the diverse modes in a documentary discourse work together to construct changes in the interviewees’ images and ideologies while taking the multiple contexts into consideration, this study bears certain practical significance in enhancing people’s critical awareness of documentaries from a linguistic perspective and in encouraging people to work hard, grasp opportunities and fulfill dreams.
Data availability
The data used in this article are publicly available documentary films, which can be accessed on the Bilibili website (https://www.bilibili.com/).
References
Abousnnouga G, Machin D (2013) The language of war monuments. Bloomsbury, London
Bateman JA, Wildfeuer J, Hiippala T (2017) Multimodality: foundations, research and analysis – a problem-oriented introduction. De Gruyter Mouton, Berlin
Bednarek M, Caple H (2012) ‘Value added’: language, image and news values. Discourse, Context Media 1(2–3):103–113. https://doi.org/10.1016/j.dcm.2012.05.006
Bruzzi S (2018) From innocence to experience: the representation of children in four documentary films. Studies in Documentary Film 12(3):208–224. https://doi.org/10.1080/17503280.2018.1503861
Catalano T, Gatti L (2017) Representing teachers as criminals in the news: a multimodal critical discourse analysis of the Atlanta schools’ cheating scandal. Social Semiotics 27(1):59–80. https://doi.org/10.1080/10350330.2016.1145386
Chen A, Machin D (2014) The local and the global in the visual design of a Chinese women’s lifestyle magazine: a multimodal critical discourse approach. Vis Commun 13(3):287–301. https://doi.org/10.1177/1470357214530059
Cui W, Zheng L (2023) A multimodal discourse analysis of Planet Earth II from the perspective of visual grammar. J Hum Arts Soc Sci 7(12):2455–2459. https://doi.org/10.26855/JHASS.2023.12.010
Fairclough N, Wodak R (1997) Critical discourse analysis. In: Van Dijk TA (ed) Discourse studies: a multidisciplinary introduction. Sage, London, p 258–284
Garrigan S (2017) The composite identities of Jewish Mexicans in Mexican documentary films. J Jew Identities 10(2):155–172. https://doi.org/10.1353/jji.2017.0008
Gibson AF, Lee C, Crabb S (2015) Reading between the lines: applying multimodal critical discourse analysis to online constructions of breast cancer. Qual Res Psychol 12(3):272–286. https://doi.org/10.1080/14780887.2015.1008905
Jewitt C (2005) Multimodality, “reading”, and “writing” for the 21st century. Discourse: Studies in the Cultural Politics of Education 26(3):315–331. https://doi.org/10.1080/01596300500200011
Kenalemang LM (2022) Visual ageism and the subtle sexualisation of older celebrities in L’Oréal’s advert campaigns: a multimodal critical discourse analysis. Ageing Soc 42(9):2122–2139. https://doi.org/10.1017/S0144686X20002019
KhosraviNik M, Zia M (2014) Persian nationalism, identity and anti-Arab sentiments in Iranian Facebook discourses: critical discourse analysis and social media communication. J Lang Polit 13(4):755–780. https://doi.org/10.1075/jlp.13.1.08kho
Kress G, Van Leeuwen T (2006) Reading images: the grammar of visual design, 2nd edn. Routledge, London
Li Y, Li J (2022) Representation of refugees and officials in U.S.-Mexico border crisis newsbites: a multimodal critical discourse analysis. J Soc Sci Humanit 4(1):171–178. https://doi.org/10.53469/jssh.2022.4(01).36
Machin D (2014) Sound as discourse: a multimodal approach to war film music. In: Hart C, Cap P (eds) Contemporary critical discourse studies. Bloomsbury Academic, London, p 297-318
Moran J (2002) Childhood, class and memory in the Seven Up films. Screen 43(4):387–402. https://doi.org/10.1093/screen/43.4.387
Nichols B (2010) Introduction to documentary. Indiana University Press, Bloomington
O’ Halloran K, Smith B (2011) Multimodal studies: exploring issues and domains. Routledge, London
Teo P (2021) ‘It all begins with a teacher’: a multimodal critical discourse analysis of Singapore’s teacher recruitment videos. Discourse & Communication 15(3):330–348. https://doi.org/10.1177/1750481321999909
Thorne B (2009) The Seven Up! films: connecting the personal and the sociological. Ethnography 10(3):327–340. https://doi.org/10.1177/1466138109342830
Wei T (2023) The construction and communication of national image by publicity documentary subtitle translation from multi-modal perspective: a case study of China on the Move. J Soc Sci Humanit Lit 6(4):101–105. https://doi.org/10.53469/JSSHL.2023.06(04).20
Wodak R, Meyer M (2009) Methods of critical discourse analysis. Sage, London
Yuan XL, Nai RH (2022) On the multimodal discourse meaning construction in the international communication of “Cultural China”. Foreign Lang Educ 43(5):23–29. https://doi.org/10.16362/j.cnki.cn61-1023/h.2022.05.011
Zhang DL (2009) On a synthetic theoretical framework for multimodal discourse analysis. Foreign Languages China 6(1):24–30. https://doi.org/10.3969/j.issn.1672-9382.2009.01.006
Acknowledgements
Part of the achievements of the project funded by the National Social Science Fund of China (18BYY220).
Author information
Authors and Affiliations
Contributions
The presented idea has been conceived by DYW and YHX. DYW has collected data, performed data analysis, drafted the manuscript and done the revisions. YHX has done the supervision work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, D., Xiang, Y. Diachronic image construction in the documentary Seven Up: a multimodal critical discourse analysis. Humanit Soc Sci Commun 12, 303 (2025). https://doi.org/10.1057/s41599-025-04641-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1057/s41599-025-04641-1