Introduction

The importance of adopting a cinematographic approach to film dubbing has been emphasized in audiovisual translation studies (Chaume 2004a; Matamala 2010). In particular, Chaume (2004a) presented twelve codes that offer a cinematographic perspective for analyzing dubbing, marking a significant milestone in advocating for interdisciplinary research that bridges translation and film studies. However, while interdisciplinary studies in audiovisual translation have become quite common, empirical research adopting such an approach remains rare. This study advocates for the adoption of the shot as the analytical criterion, given its role as the fundamental unit of film editing.

Editing holds a central position in cinema that it has been described as “the foundation of cinematic art” (Pudovkin 2013: 23) or “the grammatical language of cinema” (Giannetti 2014: 136), with its primary task being the connection of shots. Consequently, identifying the types and characteristics of shots becomes highly significant. Among the twelve codes proposed by Chaume (2004a) as analytical criteria for cinematic dubbing studies, the planning code and syntactic code are based on shots. The planning code pertains to the types of shots, such as close-ups where a character’s mouth is clearly visible. The syntactic code focuses on shot associations, examining how subsequent shots or audiovisual punctuation mark, such as fade or wipe-offs, can influence the translation. Therefore, the ability to comprehend how shots are connected within the film’s plot and narrative, as well as to grasp the essence of the film text itself, is deemed an essential skill for translators as well as dubbing dialogue writers and dubbing actors. Matamala (2010) also argued for the necessity of studying the relationship between synchronization and shots, including close-up shots, long shots, and the presence or absence of actors on-screen. Thus, the analysis of film dubbing dialogue from a cinematographic perspective necessitates a foundation built upon shots. As the duration of individual shots has become shorter and the pace of shot transitions has increased (Cutting and Candan 2015), greater attention should be given to the interaction between shots and dubbing.

All dialogues in a film can be categorized as either on-screen dialogues, in which the speaker is visible, or off-screen dialogues, in which the speaker’s voice is heard but not seen. The Pavia Corpus of Film Dialogue, which comprises original Italian and English films, adopts the “dialogue turn” as the unit of segmentation. However, this approach may lead to lengthy segments and inconsistencies between characters’ on-screen appearances and their speech. By contrast, if shots are used as the unit of analysis and serve as segmentation units, on-screen and off-screen dialogue can be clearly distinguished. This distinction facilitates a thorough analysis of the difference between the two. For instance, a character may begin speaking while visible on-screen, but the camera could then cut to another character listening, causing the original speaker’s dialogue to become off-screen. Shot-based segmentation allows for accurate alignment with visual cues.

Nonetheless, manual segmentation of shots in a 100-min film, which can exceed 1000 shots, can be time-consuming. To address this, a multimodal corpus, ELAN, is utilized for segmentation. This tool enables the segmentation of text and annotation with multiple layers of information. The dataset consists of Korean film DVDs (Region Code 1) released in North America during the 21st century. It includes 12 English-dubbed Korean films, representing the complete set of such films available in this format as of 2020, when the data was compiled. After distinguishing between on-screen and off-screen dubbing of these films using ELAN, the study aims to identify the characteristics of each type based on synchronization, which is considered the most important element of dubbing (Chaume 2004b; Chaume 2020). It also examines the key linguistic features of bilabials (Spiteri Miggiani 2019) of Korean and English, which are essential for ensuring natural synchronization. Considering the significant differences between Korean and English, the linguistic features offer insights into the differences in dubbing strategies between on-screen and off-screen dubbing. Additionally, shot-based segmentation can significantly impact translation output, as off-screen segments eliminate the need for strict synchronization. This allows for the addition of new dialogue not present in the source text. This study aims to examine how these differences between on-screen and off-screen dialogue affect translation strategies based on shots. It also seeks to provide insights into how shot-based segmentation enables the categorization of various types of shot sizes and sound types, and how this categorization ultimately influences translation outcomes.

Research background

Shot

According to Bowen (2017), a shot consists of six components: motivation, information, composition, sound, camera angle, and continuity. For the purpose of this research, the components of “composition,” referring to shot sizes, and “sound” have been selected as the basis of analysis from among the six components. The selection is motivated by the importance of shot size in dubbing studies (Chaume 2004a, 2004b, 2008), and prior research that employed sound types, such as voiceover and off-screen, as criteria for dubbing analysis (Franco et al. 2010; Matamala 2020). The primary reason for adopting these two components of shot is to facilitate a comparison of dubbing in on-screen situations where characters appear visually (shot size) and off-screen situations where they do not (sound). In addition, composition and sound were selected as criteria because they each have clear classifications: eight shot sizes and five sound types, respectively. By contrast, the remaining four components lack similarly well-defined categories, which makes them less suitable for this analysis.

The size of the on-screen character, determined by the shot size, conveys the psychological distance between the character and the audience, influencing the actor’s performance (Tucker 2014: 10). Furthermore, shot size is an important factor in on-screen dubbing due to its impact on synchronization, which is a crucial technical aspect of dubbing. The size of the character’s mouth changes depending on the shot size, emphasizing the need to analyze the translation aspects of on-screen dubbing in relation to shot size. Camera shots are categorized into eight types based on size: XCU (extreme close-up), BCU (big close-up), CU (close-up), MCU (medium close-up), MS (medium shot), MLS (medium long shot), LS (long shot) (Bowen 2017: 8–20). For instance, as displayed in the below Fig. 1, close-up shots, especially extreme close-up shots, require careful lip synchronization because facial expressions and emotions are prominent, and lip movements are clearly visible.

Fig. 1: Camera shots categorized into eight types based on size.
figure 1

The eight types are: XCU (extreme close-up), BCU (big close-up), CU (close-up), MCU (medium close-up), MS (medium shot), MLS (medium long shot), LS (long shot) This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of Tom Barrance; copyright © 2013–2025 Tom Barrance, all rights reserved.

Sound types are classified into on-screen dialogue, voiceover, voiceover narration, off-screen dialogue, and on-mute dialogue. While on-screen dialogue dubbing is closely linked to shot sizes, off-screen dubbing involves translating only the characters’ voices, as they are not visually present. Translators might have more freedom in off-screen dubbing, as they are liberated from the constraints of synchronization. However, coherence with the on-screen images is still important in off-screen dubbing, requiring careful consideration of the sequence of dialogue and images (Franco et al. 2010: 84). Off-screen dialogue, voiceover, and voiceover narration, and on-mute dialogue are significant elements in dubbing translation strategies, all of which are explicitly marked in the dubbing script (Spiteri Miggiani 2019: 129).

Off-screen dialogue refers to instances where the audience hears the voice of a character who is not visible within the frame. The key distinction between off-screen dialogue and voice-over lies in the spatial awareness of the audience: in off-screen dialogue, viewers understand that the character is just beyond the frame, occupying a space that exists within the narrative but remains outside the camera’s focus (Doane 1980: 37). In contrast, voice-over narration involves oral statements “spoken by an unseen speaker situated in a space and time other than that simultaneously being presented by the images on the screen” (Kozloff 1989: 5). Lastly, any dialogue added in the dubbing process but absent from the original text is designated as “on-mute” dubbing, irrespective of whether the character is visible on-screen or not. Specifically, on-mute dubbing involves inserting additional recorded dialogue, especially when the character remains off-screen (Spiteri Miggiani 2019a: 131).

Synchronization

Synchronization is considered an important element of dubbing translation, as “films demand a highly polished synchronization at all levels” (Chaume 2020: 76) compared to texts such as television dramas. If synchronization is not observed in a dubbed film, it can lead to a negative impression on audiences and a failure at the box office (Bosseaux 2018: 53).

Scholars in audiovisual translation have proposed various typologies of synchronization. The works of Fodor (1976) and Whitman-Linsen (1992) are foundational works on synchronization in dubbing. Lip synchrony is categorized under phonetic synchrony in Fodor’s typology, which also includes Chaume’s concepts of isochrony and kinetic synchrony, as well as Whitman-Linsen’s more detailed distinctions of syllable articulation, utterance length, and kinetic synchrony. For a broader perspective on synchronization, Chaume (2004b) identifies four approaches: professional, functionalist, polysystemic, and cinematographic. These studies stress the importance of achieving synchronization that integrates translations with on-screen visuals while preserving audience immersion and realism.

This study adopts Chaume’s typology of synchronies (2004b), specifically focusing on phonetic synchrony, isochrony, and kinetic synchrony, due to its comprehensive yet accessible nature. According to Chaume (2004b: 43–45), synchronization can be categorized into three types. The first is lip or phonetic synchrony, which ensures that the dubbing voice actor speaks in synchronization with the on-screen character’s lip movements. This type of synchronization is particularly critical when the character is close-up on screen. The second type of synchronization is kinetic or synchronization with body movements. This is important for dubbing as any discrepancy between what is being said and the character’s body movements may confuse the audience. Finally, isochrony, which refers to synchronization between utterances and pauses, requires that the dubbed utterance should match the duration of the character’s speech.

Phonetic or lip synchronization

Although the term “lip sync” is sometimes used to encompass the phenomenon of synchronization as a whole in the context of audiovisual translation research, phonetic synchronization, or lip sync, strictly refers to the synchronization of mouth movements (Chaume 2020: 112). In dubbing practice, phonetic synchronization guides decisions in target language output, sometimes at the cost of naturalness, which can lead to dubbese (Baños 2014, Pavesi 2018, Romero-Fresco 2006, Spiteri Miggiani 2021a). While some authors aim to objectively define dubbing language, or dubbese, many scholars have used negatively charged terms such as “fake,” “artificial,” “antirealistic,” or “stereotyped orality” to describe it (Pavesi 2018: 104). Nonetheless, to achieve synchronization, translators or dubbing dialogue writers may resort to incorporating interjections (Matamala 2009), introducing unidiomatic or unnatural speech patterns (Baños 2014), or using discourse markers and intensifiers (Romero-Fresco 2009). Interestingly, there is a reluctance to use bilabials in dubbed dialogue, especially when the on-screen characters do not produce bilabial sounds (Spiteri Miggiani 2019: 84). According to Spiteri Miggiani (2019: 83), phonetic synchronization in English primarily involves labialized consonants and vowels.

Unlike English, the bilabial plosive /b/ and the labiodental fricative /v/ do not exist in Korean; therefore, Korean speakers often perceive these phonemes as the Korean lenition /p/ and do not distinguish them well (Schmidt 1996). Since the number of bilabials and labiodentals in the Korean language is small, it is unlikely that these phonemes are extensively used in English dubbed dialogues. However, phonetic synchronization is not a “phonetic” task of matching phonemes or syllables one-to-one between source and target languages; it is instead a “visual” task of matching dubbed lines to the shape of the characters’ lips on screen (Spiteri Miggiani 2019). Therefore, it is necessary to understand the behavior of bilabials in dubbing through empirical data.

Among the English labialized consonants /b/, /p/, /m/, /v/, and /f/, the sounds of /v/ and /f/ do not exist in Korean. The corresponding Korean labialized consonants for /b/, /p/, and /m/ are the bilabial plosives ‘ㅂ’, ‘ㅃ’, ‘ㅍ’ for /b/ and /p/, and the bilabial nasal ‘ㅁ’ for /m/. In Korean, all initial ‘m’ sounds are bilabial nasals, produced by obstructing the airflow in the vocal tract while allowing some air to pass through the nose. The consonant ‘ㅂ’ represents a voiced bilabial plosive, while ‘ㅃ’ and ‘ㅍ’ represent voiceless bilabial plosives. These plosives are produced by completely stopping the airflow in the vocal tract and then releasing it in a burst. In Italian dubbing script, bilabial sounds and vowels are bolded to draw the attention of the dubbing voice actors while reading the script (Spiteri Miggiani 2019: 198). This emphasis is based on observation that the burst and movement of the lips are more pronounced when bilabial sounds occur at the beginning of a sentence or word, compared to when they appear in the middle of a word. Therefore, the current study focuses on the analysis of these consonants when they occur as the initial sounds in a sentence or word spoken by an on-screen character.

Kinetic synchronization

A character’s facial movements contribute to the rhythm of their speech and serve to accentuate specific words. Actions like nodding or grimacing often coincide with the nucleus of a character’s utterance, reinforcing the interaction between the nucleus and the character’s body language in aiding the audience’s comprehension of the dialogue (Luyken and Herbest, 1991: 160). Furthermore, dialogue writers prioritize visual cues and body language on-screen over the semantic content of the verbal text, often deviating from the original text (Spiteri Miggiani 2019: 7).

It is important to consider that facial expressions and gestures can vary across cultures when addressing dubbing challenges (Schwarz 2011). In this vein, it is essential to consider the relationship between word order and kinetic synchronization in the context of Korean and English. Korean is a head-last language, where the order is “object + verb,” while English is a head-first language, with the order being “verb + object.” Given the substantial differences in word order between Korean and English, it is necessary to ensure that body movements are synchronized with their respective word orders.

Isochrony

In dubbing, it is crucial to synchronize the length of the character’s lines. Neglecting the character’s lip movements can lead to situations where the character’s mouth moves without any accompanying sound or where the character’s lips remain still while the dubbing voice actor’s voice is audible. These violations of isochrony can disrupt the viewer’s suspension of disbelief, their willingness to accept the fiction to enjoy cinematic experience. Romero-Fresco redefines this concept in dubbing as the “suspension of linguistic disbelief,” (2009: 68–69) where viewers are inclined to overlook the possible unnaturalness of the dubbed dialogue (2006). However, even with this tolerance, the alignment of speech duration is considered a vital factor in enabling the audience to maintain their suspension of disbelief (Spiteri Miggiani 2019: 80).

The significance of isochrony is emphasized as the most critical form of synchronization in real-world dubbing scenarios (Chaume 2004b: 17). To achieve isochrony, Chaume (2012: 72–73) proposes two translation strategies. Amplification involves extending the dubbed dialogue by utilizing techniques such as repetition, paraphrasing, and the use of synonyms. Reduction, on the other hand, involves the omission of repeated expressions, interjections, modal verbs, proper nouns, and address terms. Spiteri Miggiani (2021a: 20) posits that the difficulty in English-language dubbing can be attributed to the inherent nature of the English language, specifically its concise form, which often requires expansion to match the length of other source languages. Considering the variations in speech rate and information density across languages (Pellegrino et al. 2011), it is valuable to explore the dubbing strategies employed when dubbing from Korean to English.

Dubbing strategies

Matamala (2010) provides an in-depth analysis of translation strategies employed during film dubbing. The strategies are derived from corpus building and text analysis of three Hollywood films that underwent dubbing in Spanish and Catalan. Six translation strategies are identified: reduction, repetition, amplification, modification, change order, and deletion. These six dubbing translation strategies were selected for this study due to their comprehensive scope. They cover key aspects of the dubbing process, from synchronization to linguistic modifications and the final recording stage. This broad categorization enables a meticulous analysis of Korean dialogue and its English-dubbed translations. The present study examines the translation of dubbed dialogue using these six strategies alongside four additional ones and compares them to the original dialogue. By incorporating new strategies (in bold), the analysis covers a total of ten translation strategies, as depicted in Table 1.

Table 1 Ten translation strategies for dubbing: six established and four newly introduced strategies.

Four additional strategies are introduced and developed in this study. The first is “addition,” which involves adding lines that are not present in the original version but are included in the English dubbing. The second strategy, “matching,” pertains to cases where the number of syllables in a shot exactly matches between Korean and English dubbing. Matamala (2010) focuses on language pairs, Spanish and Catalan, where the number of syllables in the original and dubbed lines frequently match; her categories, such as change order and modification, are based on the assumption of matched syllable counts. However, for Korean and English, a separate category for matching is necessary since syllable count matches are rare between these two languages. The third strategy is “transliteration,” where the dubbed voice actor repeats the Korean dialogue, primarily used for address terms or interjections. The fourth strategy is “partial deletion,” which occurs when a line is partially omitted rather than entirely removed. Since this research analyzes dialogue based on shots, it is crucial to distinguish partial deletion from complete deletion, as a shot may transition to another during the speech. Partial deletion also differs from reduction; reduction specifically refers to instances where the syllable count of the original dialogue is decreased. In contrast, partial deletion indicates that certain parts of the original dialogue are absent from the translation.

Unlike the language pair examined in Matamala (2010), the current study focuses on Korean and English, which possess distinct cross-linguistic characteristics, and thus require an adjustment in the approach to examining dubbing strategies. Specifically, since it is uncommon for the original and dubbed versions to have the same syllable count, this study prioritizes analyzing changes in meaning, order, and repetition as primary strategies. When no alterations in meaning or order, or repetition are observed between the original and dubbed versions, changes in syllable count are examined as a secondary focus.

Methodology

Data

The dataset used in this study is restricted to Korean film DVDs (Region Code 1) that were distributed to the North American market during the 21st century. The decision to focus on DVDs of Korean films was made to ensure a defined scope with comprehensive coverage. As of 2020, when the data was compiled, only 13 Korean films were available on DVD with English dubbing. This represents a complete set for this limited format. The film War of the Arrows (2011) was excluded because it primarily features the Manchu language over Korean. Consequently, the dataset includes 12 Korean film DVDs with English dubbing, identified based on the overseas distribution data provided by the Korean Film Council. The film texts analyzed, listed chronologically by their DVD release year, are: Shiri (2002), Oldboy (2005), Memories of Murder (2005), Tae Guk Gi: The Brotherhood of War (2005), The Host (2007), Haeundae (2010), Man from Nowhere (2011), I Saw the Devil (2010), Masquerade (2016), The Admiral: Roaring Currents (2016), Veteran (2016), and Train to Busan (2017).

Seven of the films listed garnered audiences exceeding 10 million viewers in KoreaFootnote 1, while the remaining five are either highly acclaimed or hold significant importance in Korean film history. Notably, despite Shiri not receiving a high audience rating on IMDb, it is considered a pivotal work in the history of Korean cinema. As the film that inaugurated a new genre in South Korean film history (Diffrient DS, 2001: 45), Shiri remains a worthwhile subject for analysis.

ELAN

ELAN (2020), developed at the Max Planck Institute for Psycholinguistics in Nijmegen, the Netherlands, is a multimodal corpus tool. It enables complex annotation of audiovisual texts with multi-layered annotation capabilities. ELAN supports precise alignment of on-screen utterances with time codes. Key features of ELAN include the ability to annotate an unlimited number of tiers, allowing for the creation of various tiers tailored to different objectives. Annotations are displayed explicitly, and each annotation is written in a standoff format, enabling separate work for each tier. ELAN also provides robust support for both quantitative and qualitative statistical analysis (Sloetjes 2017).

Other multimodal corpus tools, such as ANVIL, Praat and Taggetti, are available. However, these tools have limitations for dubbing analysis, as they lack the sophisticated linguistic annotation capabilities for visual elements offered by ELAN. Furthermore, it is noteworthy that not all languages are supported, and Korean is one of the few non-European languages available in the program. At present, ELAN offers language modules for Catalan, Chinese, Dutch, English, French, German, Japanese, Portuguese, Russian, Spanish, Swedish, and Korean.

Mouka et al. (2015) demonstrates the effectiveness of ELAN in audiovisual translation studies. Their research investigates register shifts in racist discourse when English dialogues are subtitled into Greek and Spanish. ELAN proves to be optimal for visual and linguistic annotation based on simultaneous analysis of multiple tiers on a single screen.

Segmentation

Segmentation in the ELAN program (Version 5.9) is conducted by selecting ‘Two keystrokes per annotation’ (non-adjacent annotation) in the Segmentation Mode while the film is playing. The segmentation is performed as a non-adjacent unit, even though film shots are typically adjacent units. This approach is necessary due to the limited number of camera shots where a character speaks. By pressing the enter key at the beginning and end of each shot, a segmented line is created to accommodate the insertion of annotations. In ELAN, film shots and sound can be separated into distinct tiers, as illustrated in Fig. 2 below.

Fig. 2: Segmentation of shots in ELAN 5.9.
figure 2

The shot tier is used to break down on-screen dialogue into individual shots, while the sound tier distinguishes off-screen dialogue. This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of The Max Planck Institute for Psycholinguistics; copyright © The Max Planck Institute for Psycholinguistics, all rights reserved.

In Fig. 2, the shot tier is used to break down on-screen dialogue into individual shots, while the sound tier distinguishes off-screen dialogue. This enables the analysis of on-screen and off-screen dialogue separately. The film dialogue track includes dialogue, automated dialogue replacement, and wallaFootnote 2. This study examines not only clearly audible dialogue but also identifiable walla present in the film text’s dialogue track.

Annotation

ELAN stands out among the various multimodal analysis software available for audiovisual materials due to its optimized annotation features. To differentiate between on-screen and off-screen dialogue, two tiers are created: the shot tier, annotated with information about shot size, and the sound tier, annotated with information about the sound types.

Camera shots were categorized into eight types based on size: XCU (extreme close-up), BCU (big close-up), CU (close-up), MCU (medium close-up), MS (medium shot), MLS (medium long shot), LS (long shot) (Bowen 2017: 8–20). In close-up shots, particular attention is given to lip synchronization, as facial expressions and emotions play a significant role, and lip movements are highly visible. In medium shots, often used in dialogue scenes, the focus expands to include not only the characters’ faces but also their body movements. Conversely, long shots prioritize the surrounding situation over individual character, so contextual elements are considered during annotation.

Sound is categorized into four types: on-screen dialogue, voiceover, voiceover narration, and off-screen dialogue. “On-mute” lines, which are not part of the original text and are added during dubbing, require separate analysis. These on-mute dialogues are added after extracting the text files from ELAN. ELAN version 5.9, used for the analysis, is unable to deal with files with two audio channels. Thus, the segmentation and annotation processes were based on the original Korean file. This choice was deemed appropriate to use the original file as a reference to understand if dubbed lines were added or deleted.

When categorizing sound types, the primary criterion is whether the speaker appears on-screen or not. Based on the aforementioned categorization, the basic sound types in the sound tier are annotated as follows: off-screen (OS), voiceover narration (VON), voiceover (VO), and on-screen invisible mouth (IM). The classification of “on-screen invisible mouth” (IM) is introduced exclusively for this study. It refers to cases where a character’s lip movements are not visible due to obstructions, although the character remains visible on screen. This distinction is crucial for dubbing, as it removes the need to synchronize lip shape and speech length. For instance, when a character’s mouth is covered by a mask, or when the character is turned away or positioned too far to see their lips clearly, IM annotations are applied. In such cases, the dubbing notes should indicate “COVERED” or “FILTERED” (Spiteri Miggiani 2019: 131).

Results and discussion

On-screen dubbing

On-screen dubbing strategies

The total number of dialogue shots in the twelve texts analyzed is 12,543, with 7625 on-screen shots and 4918 off-screen shots, accounting for approximately 61 and 39% of the total dialogue shots, respectively. As shown in Table 2, Haeundae had the highest number of dialogue shots (2050), while I Saw the Devil had the fewest (536). On average, each film contained slightly over 1000 dialogue shots.

Table 2 Total number of dialogue shots in the twelve analyzed texts.

Given the extensive variety of shot sizes, analyzing dubbing characteristics across all eight categories is not feasible. Instead, shot sizes are commonly grouped into three primary categories: close-up shots, medium shots, and long shots. Therefore, this study focuses on examining on-screen dubbing within these three widely recognized classifications.

Reduction emerges as the predominant translation strategy for on-screen dubbing, representing approximately 50% of all three shot sizes. However, each shot size demonstrates unique features that influence various aspects of dubbing. Close-up shots reveal instances where dubbed dialogue is modified from the original to meet synchronization requirements. Medium shots constitute the most frequently employed shot size for dialogue scenes, encompassing 4826 (63%) of the 7625 on-screen shots. In the case of medium shots, the discussion focuses on the prominent use of reduction as a dubbing strategy and illustrates how shot transitions within dialogue scenes can impact dubbing. Finally, the analysis of long shots focuses on kinetic synchronization, which is identified as the primary finding within this specific category of shots.

Close-up dubbing

According to Bowen (2017), close-up shots hold particular significance as they serve as a powerful tool in the language of film for conveying meaning. Close-ups not only depict the subject in a detailed manner but also become more prevalent in dramatic situations as a film progresses toward its climax. In this study, the dialogues in both the original and dubbed versions of close-up shots are analyzed, and the observed differences show instances where the dubbed dialogues diverge from the original dialogues due to synchronization challenges.

In the film Haeundae, a scene set at a baseball field presents an instance of phonetic synchronization. Man-sik accompanies Yeon-hee to Sajik Stadium to watch a Lotte Giants game, where he becomes increasingly disheartened by the team’s repeated errors and poor performance, leading him to consume alcohol excessively. During the scene, there is a close-up shot of Yeon-hee chanting “Mamama,” (마마마) followed by a shot of her restraining Man-sik from cursing at the players. The chant “Mamama,” loudly shouted by Yeon-hee and the rest of the crowd, is a cheer commonly used during Lotte Giants games to express dissatisfaction when an opposing pitcher attempts to keep a runner close to the base. The English dubbing of the chant, displayed in Table 3 belowFootnote 3, does not transliterate the original chant.

Table 3 English dubbing of the “Mamama” chant in the baseball scene of Haeundae (2010).

In the English-dubbed version, the Korean word “ma” is translated as “ba,” with both words sharing the bilabial consonant sound, denoted by ‘ㅁ’ and /B/, respectively. The underlined line from Man-sik in the table above elucidates the rationale behind the modification to “ba” rather than transliterating it as “ma.” In the original line “You son of a bitch!” the dubbed version translates it as “Son of ba-ba-ba-bi-tch!” The connection between Man-sik’s dubbed line and the cheer “bababa” represents a creative adaptation, given the challenges associated with conveying the meaning of the word “mamama” in English dubbing.

Isochrony was meticulously maintained in all of the analyzed texts. The consistent synchronization of lip movements, even in scenes where only lip movement occurs without dialogue, underscores the significance of isochrony in film dubbing. An example of isochrony can be observed in the film I Saw the Devil, wherein Kim Soo-hyun visits several suspects linked to his fiancée’s murder one by one in Table 4 below. Among the suspects is an individual named “Jjang-gu,” and Soo-hyun’s objective is to ascertain if he is the true culprit. Soo-hyun seizes him from the back of his motorcycle and throws him to the ground, subsequently comparing his face to the suspect’s photograph displayed on his phone.

Table 4 Example of isochrony in I Saw the Devil (2010).

The line “What are you, asshole,” heard off-screen, occurs in a close-up shot of Jjang-gu. The Korean original does not have the line, but the English dubbing inserts the line “What the hell?” as shown in the last line of Table 4. Since the character’s mouth is open while he is being subdued by force, the audience might expect a line of dialogue; in response, the English dubbing adds the line to fit the situation.

Medium shot dubbing

The analysis suggests that medium shots are the most prevalent in dialogue scenes. Out of the 7625 on-screen shots, 2083 (27.3%), 4826 (63.3%), and 716 (9.4%) correspond to close-up shots, medium shots, and long shots, respectively, with medium shots being the most dialogue-intensive shots. Medium shots show two primary characteristics. Firstly, reduction emerges as the dominant dubbing strategy. Secondly, in dialogue scenes where medium shots are extensively utilized, shot-reverse shots are frequently adopted, creating a dynamic where the speaker’s line (shot) and the listener’s response (reverse shot) are visually disconnected.

The most common form of reduction observed in medium shots involves the elimination of repetition. Repetition refers to the recurrent use of identical or similar vocabulary in the interlocutor’s speech. It serves as a communication strategy to establish coherence and meaningful interaction and is also a prevalent feature in Korean discourse and film dialogues. For instance, in the film Veteran, there is the repetition of names such as “Do-cheol-ah, Do-cheol-ah!” (도철아! 도철아!) or the repetition of words like “You’re a veteran, veteran” (베테랑이시네, 베테랑). However, in English dubbing, these repetitions are omitted, with “Do-cheol-ah” and “veteran” each uttered only once.

While subtitling often omits repetition due to space constraints, the findings indicate that similar reductions occur in dubbing, albeit for different reasons. English dialogue tends to avoid direct repetition, favoring varied expressions or paraphrasing to maintain the natural flow of speech. For instance, in Memories of Murder, the repeated affirmation “예, 예, 예, 예, 예” (where “예” is essentially the same as “Yes”) is dubbed as “Yeah, right. Fine. Okay,” illustrating how English dubbing substitutes repetition with varied expressions. In I Saw the Devil, the protagonist’s repetition of “이렇게? 이렇게? 이렇게! 이렇게! 이렇게! 이렇게!” (where “이렇게” means “like this”) is transformed into “How about that? Huh? Is that what you want? You son of a bitch!” in the English dubbing. This substitution not only eliminates repetition but also adheres to English speech patterns.

In medium-shot dialogue scenes, the audience perceives the action-reaction dynamic through alternating shots between on-screen speakers and reverse shots of listening characters. This technique frequently leads to interruptions in a character’s dialogue by reaction shots and discrepancies between the original and dubbed versions are identified in these cases. In the original version, dialogue flows uninterrupted across shots, while in the dubbed version, the sentence concludes with the shot transition. The separation of sentences within the connection of medium and reaction shots can be regarded as a characteristic of the dubbed version. Table 5 shows the Queen’s response to Haseon from the film Masquerade, after Ha-seon refuses the Queen’s suggestion of going to bed together, citing a sudden occurrence.

Table 5 Sentence segmentation in medium-shot scene in Masquerade (2016).

During the reverse shot, the audience does not see the movement of the Queen’s lips, whereas Ha-seon is displayed on the screen. In the dubbed version, the original line “Your Highness is” (주상께선) is modified to “I’m sure it can wait.” This adjustment not only conveys the Queen’s disagreement with Ha-seon more explicitly in the dubbing but also divides the dialogue across two separate shots, whereas the original line appears in one continuous sequence across both shots.

Long shot dubbing

Long shots, also referred to as “full body” shots, capture the character from head to toe. In dubbing these shots, the primary focus is on aligning the dialogue with the character’s body movements. In contrast, XLS naturally limits audible dialogue due to the character’s distance from the camera. Of the 7625 on-screen shots with dubbed dialogue, long shots account for only 716 (9.4%) while XLS constitute a mere 169 shots (2.2%).

What is noteworthy in the dubbing of long shots is the emphasis on kinetic synchronization. For instance, in a scene from the film Haeundae, Man-sik and his friends are shown together in a long shot. When they run out of Soju, a popular alcoholic beverage in Korea, Yeon-hee purchases more and enters the frame while Sang-ryul reaches out to take Soju from her hand. In this scene, Man-sik and Woo-sung engage in an argument where Man-sik mentions his academically successful child and compares them to Woo-sung’s child. In the original, while listening to their exchange, Sang-ryeol says to Man-sik, “이 얍삽한 새끼야” (“You’re a shallow bastard”) simultaneously with his reaching gesture. Instead of faithfully translating Sang-ryeol’s line that criticizes Man-sik, it is modified to “Give me that” to align with Sang-ryeol’s body movement. This adjustment demonstrates how the dubbing often prioritizes synchronization with body movements over preserving the original dialogue’s content.

Off-screen dubbing

Off-screen dubbing strategies

The off-screen dubbing analysis shows a distinct pattern compared to on-screen dubbing. Compared to on-screen dubbing, off-screen dubbing is significantly different in two ways. First, since the characters are not on the screen, voice actors tend not to be obligated to adhere to synchronization, so changes in meaning are more pronounced. Second, inconsistencies with the visual information are also present.

Dubbing based on sound types exhibits a lower reduction rate compared to dubbing based on shot size. Moreover, the chi-square test examining the difference between on-screen and off-screen dubbing translations results in 0%, indicating that these two translation forms can be considered independent. The chi-square test is commonly adopted to analyze nominal variables between categorical variables rather than continuous variables. Given that the data in the current study are categorical, and categorized according to the translation strategies, the chi-square test was deemed appropriate. Additionally, the chi-square test requires certain conditions to be met when applied to categorical data. The test results are meaningful when the expected frequency for all values is greater than one, the expected frequency of five or less is not more than 20% of the total, and the data are independent. In the current study, all these conditions are satisfied.

In the context of off-screen dubbing, there exists greater flexibility to modify dialogue. Off-screen sound can be confused with voiceover, as both involve dialogue delivered by a character not visible on the screen. However, off-screen sound comes from a character present in the same physical space as those on-screen. In other words, the speaker in off-screen sound can be spotted by a shift or movement of camera. An example of modified dubbed dialogue featuring off-screen sound can be found in the film Man from Nowhere. Oh Sang-man is a former surgical practitioner who, after serving time for drug offenses, works with a gang of organ traffickers. When he first appears in the film, his nickname “Five Hundred” is introduced. This nickname reflects his wish to cut open the stomachs of five hundred people. Later in the film, in a shot of Oh being found with his eyeballs removed, the detective’s words are heard off-screen. In the original Korean version, the detective’s line, “지 눈깔이 뽑혔네, 이 새끼” (“You had your own eyes pulled out, you bastard”), simply reaffirms the image of Oh Sang-man’s disfigured face once more. In the English dubbing, however, the line is modified to, “I see he wanted to see his five hundredth victim so badly [by making himself a victim].” This adaptation serves as a reminder of Oh’s nickname and enriches the off-screen dialogue with previously established information.

Second, inconsistencies with the visual information are found in voiceover narration. Voiceover narration, or internal diegetic sound, projects a character’s thoughts or narration without a visible sound source on the screen. The character’s voice is superimposed, and the narration is asynchronous with the on-screen content. In the film Oldboy, the protagonist’s voiceover narration is present throughout the entire film, appearing in 34 out of a total of 144 scenes. A notable instance of voiceover narration occurs when the protagonist, Dae-su, sets out to find the place of his confinement with a clue from a Chinese restaurant named “Cheongryong” where he used to eat dumplings daily. Despite an exhaustive search of the restaurant, he finds no clues. In a moment of despair, he discovers an advertisement for a Chinese restaurant named “Jacheongryong” in the phone book. The character “자 (Ja; meaning violet)” from the phone book combines with the name “청룡 (Cheongryong; meaning blue dragon)” to form the name “Jacheongryong.” The subtitle appears as “Violet Blue Dragon,” which is a literal translation of the meaning. A line of Korean dialogue follows: “A violet-blue dragon. What on earth could it mean?” However, in the dubbed dialogue, the restaurant’s name is mentioned as “Magic Blue Dragon.” In this case, the discrepancy between the voice-over narration and the subtitles could potentially confuse the audience.

On-mute dubbing

On-mute dubbing refers to the insertion of additional recorded dialogue absent from the original track (Spiteri Miggiani 2019a: 131). The analysis reveals that on-mute dubbing is employed in various contexts. These cases can be categorized into two groups: describing on-screen events and conveying supplementary narrative information.

Firstly, on-mute dubbing is used to provide explanations of on-screen events. An instance of this can be observed in a scene from the film Haeundae where Dong-chun and Yeon-hee are walking home after a baseball game and stumble upon a box. While the camera captures Dong-chun and Yeon-hee walking together, Dong-chun disappears when Yeon-hee opens the door to her house. Only his voice is heard offscreen. As Yeon-hee discovers the mysterious box, Dong-chun’s initial words in Korean are, “오데 가서 묵지?” (“Where should I eat?”) This line reflects his inner thoughts after Yeon-hee declines his request for a meal. However, English dubbing introduces an additional line: “Hey, you got a package. Wonder what it is.” This on-mute dialogue explains both the arrival of the mysterious package and the inquisitive expression on Yeon-hee’s face.

Secondly, on-mute dubbing conveys additional narrative information. For instance, in Memories of Murder, a scene depicts a policeman and a farmer searching for a victim in a high-angle shot that renders the characters small and distant. The considerable distance between the camera and the subjects results in an on-screen invisible mouth situation, making it difficult to hear the dialogue. In the original Korean version, there is no discernible dialogue, and the audience in the source culture can only deduce the situation from the farmer’s body gestures pointing somewhere and the policeman’s urgent demeanor. However, in the English-dubbed version, the farmer’s line, “She’s over here. She’s right over this way. Here! I found her this morning,” is explicitly conveyed. This example of on-mute dubbed dialogue demonstrates the intention to provide the audience with a clear understanding of what is transpiring on screen and what will occur in the subsequent shot.

The difference between on-screen and off-screen dubbing

To assess whether there is indeed a distinction in translation strategies when characters are on-screen versus off-screen, a statistical test was conducted. As shown in Table 6, there is a noticeable difference in the frequency and percentage of on-screen and off-screen dubbing. However, it is crucial to determine if this difference is statistically significant.

Table 6 Statistical analysis of on-screen and off-screen dubbing strategies.

In Table 6, the chi-square value for on-screen and off-screen translations is χ = (10, 12,543) = 749.02, p = 0.000. This result indicates that on-screen and off-screen dialogue dubbing are statistically independent and suggests that different dubbing strategies are employed in the analyzed texts depending on whether characters are on-screen or off-screen.

Despite this statistical independence, both on-screen and off-screen dubbing share the commonality of reduction as the predominant dubbing strategy. Reduction accounts for 50% of on-screen dialogue dubbing and 39% of off-screen dialogue dubbing. This raises an important question: why is reduction so prevalent in Korean-to-English dubbing? Pellegrino et al. (2011) conducted a comparative analysis of seven languages, including English and Spanish, based on the assumption that there is a trade-off between speech rateFootnote 4 and information densityFootnote 5 in a language. Their findings concluded that while speech rate and information density differed significantly across languages, all languages share “the same overall communicative capacity.” (p. 542) Oh (2015) analyzed the information rateFootnote 6 of 18 languages under the same assumption, including Korean and English. In Oh’s findings, Korean is classified as a language with a high speech rate, while English belongs to the group of languages with high information density. Since this study analyzes film dialogues by segmenting them into shots, the speech rate and information density of Korean and English play a more significant role than their overall information rate. This is because smaller language corpora, like the segmented dialogue analyzed in this study, are more sensitive to the speech rate and language density of each language (Oh 2015: 165). Thus, the speed and density of individual languages hold greater importance than the universality of information rate in the current study. Consequently, it can be inferred that when Korean, a fast-paced language, is dubbed into English, a language with dense information content, reduction becomes particularly pronounced.

The Z-test is a statistical test used to determine whether two independent samples differ significantly in terms of proportions or means. Unlike the T-test, which is typically used when population variances are unknown, the Z-test requires knowledge of the population variance or standard deviation, either theoretically or empirically. In this study, the Z-test was used to evaluate whether the reduction strategies in on-screen and off-screen dubbing differ significantly. The null hypothesis assumes that both samples, on-screen and off-screen dubbing reductions, are drawn from the same population, meaning no significant difference exists between the two groups. The results of the Z-test yielded a z-score of −12.341 with a p value of 0.000, indicating a statistically significant difference between the two groups. Since the p value is below the threshold of 0.05, the null hypothesis was rejected. This result affirms a clear distinction in reduction depending on whether the character is on-screen or off-screen, demonstrating that reduction is used less frequently when the character is off-screen.

Conclusion

The main objective of this study is to identify whether there is a difference in the approach to dubbing depending on whether characters are on-screen or not. The results indicate that on-screen and off-screen dialogue dubbing operate independently, suggesting that the texts analyzed employ distinct translation strategies for on-screen and off-screen dubbing.

On-screen and off-screen dubbing share the common characteristic of adopting reduction as the predominant dubbing strategy. Speech rate and information density vary by language, and there are differences between Korean and English. Contrary to the common belief that Korean is more densely packed due to the use of Chinese characters in some words, Korean generally has a faster speech rate but a lower information density compared to English (Oh 2015: 48). As a result, the number of syllables is likely to be reduced in English dubbing, which the translator and/or dubbing script writer should consider.

The on-screen dubbing dialogues diverged from the original dialogues due to synchronization requirements. Different shot sizes exhibited distinct characteristics that influenced various aspects of dubbing, with the medium shots being the most frequently used for dialogue scenes. In some cases, particularly in medium shots and shot-reverse shots, the dubbed dialogue truncated sentences at the shot transition. Unlike the original version where the speaker’s sentence continues regardless of the cut, the dubbed version ends the sentence just before the cut and begins a new sentence when the cut switches to the other character’s face. Analyzing this phenomenon from a perceptual standpoint could help determine whether sentence separation at shot transitions aids audience comprehension. Nevertheless, aligning transitions and sentence separations may help reduce the audience’s cognitive load.

For off-screen dubbing, voice actors as well as translators and dubbing script writers are less bound by synchronization requirements, resulting in more noticeable changes in meaning. Moreover, on-mute dubbing, which involves adding lines of dialogue that are not present in the original, was a prominent feature of off-screen dubbing. These additions of dialogues are particularly interesting, as the clear emphasis on addition rather than deletion highlights dubbing’s academic potential (Ranzato 2011: 121).

The need for interdisciplinary research in translation and film studies was first proposed at the beginning of the 21st century, but limited empirical work based on data has been conducted. This study addresses this gap by presenting empirical research that applies film editing units, an area previously recommended for future exploration in dubbing studies (Chaume, 2004a; Matamala, 2010). Additionally, a novel research methodology is employed here through the use of the multimodal corpus tool ELAN. This tool provides a new structure for examining the multimodal aspects that arise in media translation. To fully approach audiovisual translation from a comprehensive perspective, it is essential to understand the film as a text in its own right. By grounding its analysis in the textual nature of film, this study offers a valuable contribution to translation research. Finally, this research suggests several implications for current dubbing practices. This study emphasizes the need for distinct strategies when dubbing on-screen versus off-screen dialogue, with reduction being a common strategy due to English’s slower speech rate and higher information density compared to Korean. Dubbing professionals should anticipate and adjust for syllable reduction to ensure natural synchronization.

Nonetheless, this study has limitations, particularly in its use of original and dubbed versions of Korean films without genre restrictions. Since each film genre exhibits distinct characteristics, this approach may overlook genre-specific traits. Future research could address this limitation by building a larger corpus of translated film dialogue or conducting genre-specific analyses. Building a comparable corpus alongside parallel corpora, such as the Pavia Corpus of Film Dialogue, could foster more diverse discussions in the field. Due to its relatively short history, English dubbing lacks established norms and conventions (Spiteri Miggiani 2021b: 138). This absence not only calls for increased contributions from diverse language pairs but also provides greater flexibility to adopt new strategies and develop a unique identity within dubbing practices. Furthermore, future studies could explore audience reception to provide practical insights into how different dubbing strategies impact comprehension and viewer enjoyment.