Introduction

The RA title is not merely a linguistic label but a rhetorical act central to the practice of science and dissemination of knowledge (Soler 2011; Hao 2024), serving as the first point of contact between a text and its prospective readers (Orbay et al. 2025). As concise yet rhetorically dense, RA titles both summarise content and position research within disciplinary discourse communities (Soler 2011; Cleland 2025). They signal research topics, methods, and sometimes results, while also performing persuasive functions that enhance visibility and accessibility in an increasingly competitive publishing landscape (Goodman 2010; Hyland and Zou 2022; Lehmann 2022; Cleland 2025). For these reasons, the study of RA titles has extensively enticed scholarly attention within genre analysis, English for Specific Purposes (ESP), English for Academic Purposes (EAP), and academic writing pedagogy, with research examining their structural, syntactic, and rhetorical features across disciplinary and cultural contexts (e.g., Martín and León-Pérez 2024; Matsubara 2024; Perdomo and Morales 2024; Wang et al. 2024; Brett 2025; Cleland 2025; Orbay et al. 2025).

At the same time, recent and disruptive advances in generative artificial intelligence (GAI), particularly large language models (LLMs) that generate texts in response to user prompts, are increasingly impacting higher education and reshaping the landscape of scientific research and academic writing (Nguyen et al. 2024; Ou et al. 2024; Zou et al. 2025). Chat generative pre-trained transformer (ChatGPT), as one of the most leading GAI-powered tools, has attracted considerable attention in academic writing for its advanced natural language processing capabilities and efficiency in generating human-like texts (Tsai et al. 2024). ChatGPT has rapidly become a versatile writing assistance tool capable of performing a multitude of language-based tasks that span from relatively simple to more complex ones (Meniado et al. 2024). Beyond generating topics, ChatGPT can develop outlines, provide high-quality corrective feedback on grammar and vocabulary, deconstruct complex texts, and suggest strategies for improving clarity, readability, and cohesion (Barret and Pack 2023; Kaebnick et al. 2023; Lund et al. 2023; Tsai et al. 2024).

Within the context of academic writing and publishing, GAI has the potential to revolutionise the scholarly publication landscape and streamline multiple stages of the research process and science communication (Hosseini and Horbach 2023; Kaebnick et al. 2023; Lozić and Štular 2023; Lund et al. 2023). For authors, ChatGPT, when used responsibly, can offer a range of practical benefits, such as assisting with the formulation of research ideas, identifying research gaps, synthesising large volumes of data, revising and proofreading manuscripts, translating or paraphrasing texts, identifying suitable journals for submission, and ensuring alignment with journal conventions (Cotton et al. 2024; Gilat and Cole 2023; Ingley and Pack 2023; Jiao et al. 2023; Kaebnick et al. 2023; Karakose 2023; Lozić and Štular 2023; Özçeli̇k 2023; Nguyen et al. 2024). For editors and peer reviewers, ChatGPT can provide tangible benefits by automating tedious routine tasks (Gilat and Cole 2023; Ebadi et al. 2025), detecting instances of plagiarised content (Huang and Tan 2023; Schulz et al. 2022), promoting greater consistency in peer review (Hosseini and Horbach 2023; Schulz et al. 2022), and generating constructive feedback (Liang et al. 2024; Marrella et al. 2025).

As this field is still at its very onset, a dynamically growing body of research on GAI and academic writing has therefore begun to take shape. This emerging scholarship has examined a multitude of subjects, including the potential uses of ChatGPT in academic writing, particularly L2 writing, as both a writing assistant and an assessment tool (e.g., Lozić and Štular 2023; Mizumoto and Eguchi 2023; Nurseha 2023; Özçeli̇k 2023; Yan 2023; Nguyen et al. 2024; Pham and Le 2024; Zou et al. 2025), as well as the experiences and perceptions of language teachers and learners on the appropriate uses of GAI in the writing process (e.g., Barret and Pack 2023; Nguyen 2023; Xiao and Zhi 2023; Madden et al. 2025).

More specifically, very few studies have directly compared ChatGPT-generated texts with those authored by humans, with a primary focus on extended academic genres such as essays (e.g., Jiang and Hyland 2025a, 2025b, 2025c), bachelor’s and master’s theses (e.g., Nowacki and Wrochna 2025), and abstracts (e.g., Gao et al. 2023; Huang and Deng 2025; Zhang and Zhang 2025). Taken together, these studies suggest that ChatGPT is capable of producing extended academic texts that approximate human writing at a macro-discourse level, displaying structural coherence and functional coverage of key genre moves.

Despite these insights, existing research has overwhelmingly focused on extended academic genres, leaving the generation of highly compressed and convention-sensitive micro-genres such as RA titles largely underexplored. In contrast to longer genres, RA titles distil complex research content into a highly conventionalised and disciplinarily situated form, where redundancy and elaboration are necessarily minimised. Under such constraints, even relatively small lexical, syntactic, or rhetorical variations can carry disproportionate communicative weight and thus become analytically salient. Beyond their formal properties, RA titles function as the primary point of entry between a study and its potential readership, playing a central role in indexing, visibility, and initial evaluation within academic publishing. This function is particularly pronounced in medical research, the disciplinary context examined in the present study, where titling practices are shaped by strong expectations of precision, transparency, and evidential accountability.

From a genre-analysis perspective, RA titles therefore provide a sensitive site for examining whether GAI systems align with established disciplinary conventions, tend to amplify recurrent patterns, or produce surface-level approximations that diverge from human-authored norms. Against this backdrop, the present study addresses an underexplored area in the literature by systematically comparing human-authored and ChatGPT-generated medical RA titles across a range of micro-level features, notably length, form, syntactic structure, and content focus. By foregrounding a highly conventionalised academic micro-genre, the study extends existing research on GAI in academic writing beyond longer text types and offers empirical insight into ChatGPT’s handling of constrained disciplinary discourse. The findings further inform ongoing discussions in EAP by elucidating how GAI-generated titles may interact with established genre norms, with implications for genre awareness and pedagogical practice.

Literature review

Research article titles

Swales’ (1990) observation that titles were little considered has inspired a growing body of scholarship into titleology, revealing disciplinary variations, new trends, and evolving norms (Goodman 2010; Salager-Meyer et al. 2017; Xiang and Li 2020; Heßler and Ziegler 2023; Jiang and Hyland 2023; Martín and León-Pérez, 2024; Brett 2025). Specifically, extant research into medical titling practices suggests that medical RA titles are highly conventionalised and rhetorically intricate forms that have language-specific features. One of these features that has most frequently been examined is length (e.g., Jiang and Hyland 2023; Hao 2024; Martín and León-Pérez 2024), revealing that medical titles tend to be longer than those in other disciplines. This characteristic feature has been linked with more downloads and higher citation rates (Habibzadeh and Yadollahie, 2010; Heßler and Ziegler 2022). Other linguistic features which have been examined are title syntactic form and semantic content (Busch-Lauer 2000; Goodman et al. 2001; Kerans et al. 2020; Hao 2024; Matsubara 2024). The findings have shown that medical RA titles tend to be colonic, informative, nominal, and methods-oriented. Medical RA titles do not merely function as a summary of the research reported but act as a stand-alone text indicating its clinical relevance (Goodman 2000). This underpins the epistemic function of RA titles in medical discourse, mediating between research production and its professional applications.

GAI in academic writing

As one of the most advanced versatile AI LLMs, ChatGPT operates through a transformer-based architecture and a two-stage training process comprising pre-training and fine-tuning (Cotton et al. 2024; Hai 2023). The transformer model employs self-attention mechanisms to capture syntactic and semantic relationships within language, enabling the model to produce coherent and contextually appropriate responses based on user prompts (Ray 2023; Chang et al. 2024). During pre-training, ChatGPT acquires general linguistic patterns and word knowledge from extensive text corpora, while the fine-tuning phase adapts the model for specific communicative tasks through reinforcement learning from human feedback (RLHF), ensuring greater alignment with user expectations and genre-specific conventions (Hai 2023; Howard et al. 2023; Chang et al. 2024). These mechanisms underpin ChatGPT’s ability to generate human-like academic texts.

Existing research comparing human-authored and ChatGPT-generated academic texts, albeit limited, has extensively concentrated on metadiscourse use in argumentative essays and abstracts. This line of enquiry suggests that ChatGPT can reproduce macro-level features of academic discourse with considerable fluency. However, the model has a reduced capacity in reflecting the rhetorical diversity and interpersonal linguistic complexities characteristic of human academic writing. For instance, Jiang and Hyland (2025a, 2025b, 2025c) examined engagement markers in argumentative essays written by students and those generated by ChatGPT. Their findings indicate that while it can generate structurally coherent essays, ChatGPT exhibits a limited ability to build interactional arguments and display rhetorical flexibility and evaluative complexity found in student-authored essays. Similarly, Zhang and Zhang (2025) explored disciplinary variations in metadiscourse use in RA abstracts generated by ChatGPT and those written by human authors. They observed broad similarities in overall discourse functions, but notable differences in frequencies and distributions. Relatedly, Huang and Deng (2025) investigated ChatGPT’s use of shell nouns when generating abstracts for dissertations. They reported that ChatGPT-generated abstracts were more promotional and repetitive, but with less authorial visibility. Recent work has also examined AI-generated language in educational contexts by comparing AI output with human feedback or writing support, further reflecting a growing interest in evaluating AI performance against human benchmarks (Alnemrat et al. 2025).

Despite these advances, research has largely overlooked shorter, rhetorically dense genres such as RA titles. As the first point of contact between a text and prospective readers, RA titles play a critical role in positioning and promoting the article in a highly competitive academic publishing landscape. Adding to this the critical role of titles in medicine, a field where the rhetorical and the formal are at interplay. Medical RA titles therefore provide a revealing lens for examining the extent to which GAI can replicate human rhetorical practices, that privilege informativeness and precision. Examining human and GAI-generated RA titles not only addresses a notable void in the literature but also raises critical pedagogical questions about whether ChatGPT might reproduce or reinforce formulaic tendencies in academic writing within EAP contexts.

Therefore, the present study seeks to fill this gap by comparing human-authored and ChatGPT-generated medical RA titles across multiple micro-level features, notably length, form, syntactic structure, and content focus. In doing so, it contributes to the understanding of ChatGPT’s potential in academic writing and underscores the critical engagement with GAI as both a resource and a challenge for genre pedagogy.

Methods

Data collection and corpus

This comparative analysis examines two corpora of medical RA titles to uncover the extent to which ChatGPT converges or diverges from human titling practices. The first corpus consists of 300 RA titles selected from three leading general medical journals, namely The Lancet, The BMJ, and JAMA, each represented by 100 titles. These journals were selected not only for their high impact but also because prior genre-based research has highlighted their highly conventionalised RA titling practices, particularly the frequent use of nominal constructions and multi-unit titles (Goodman et al. 2001; Busch-Lauer 2000; Kerans et al. 2020). As such, they exemplify dominant genre norms in medical research publishing and provide a robust site for examining micro-level features such as title length, form, syntactic structure, and content focus. Limiting the dataset to three journals with well-established and relatively homogeneous titling conventions allowed to reduce editorial and disciplinary variability while ensuring sufficient data to identify stable genre patterns.

To account for potential journal-specific influences on title features, we reviewed the submission guidelines of each journal regarding title word limits, punctuation, formatting conventions, and AI policy. All three journals permit colon-separated multi-unit titles and nominal or verbal constructions, impose modest word limits (ranging from 15 to 20 words), and require clear disclosure of AI use if applied. Specifically, The Lancet allows AI only for language improvements with full disclosure; The BMJ recommends informative titles with optional subtitles indicating study design; and JAMA requires concise, specific titles and discourages undeclared AI-generated content. None of the titles in our dataset indicated AI-assisted generation, minimizing the likelihood that human-authored titles were influenced by AI. These considerations guided the selection of titles and ensured consistency and transparency in the dataset.

Only original RAs were included; reviews, perspectives, editorials, and other article types were excluded to maintain comparability in genre and micro-level features of titles. Titles were collected starting from the first issue of 2022 and proceeding chronologically until 100 original research articles were obtained from each journal, yielding the 300 human-authored titles (HATs) in the first corpus.

The second parallel corpus consists of 300 medical RA titles generated by ChatGPT. For each human-authored title, ChatGPT was prompted to generate a parallel title by providing the article’s abstract as input and requesting an appropriate RA article for. This procedure yielded 300 ChatGPT-generated titles (CGTs), ensuring correspondence between the two corpora. Overall, the two corpora comprised a total of 600 RA titles. Details of the two corpora are provided in Table 1.

Table 1 Summary of the data corpus.

Prompting procedure

To generate the second corpus of RA titles, ChatGPT was used as a text-generation tool under controlled prompting conditions. All titles were generated using the same ChatGPT version (ChatGPT 4.0), accessed via the OpenAI web interface. No custom fine-tuning was applied during text generations since parameter adjustment is not available in the free GPT-4.0 website.

For each human-authored RA title in the corpus, ChatGPT was prompted once to generate a parallel title using the corresponding article abstract as input. The prompt was reused verbatim across all 300 generations to ensure consistency and minimise variability introduced by prompt phrasing. To reduce potential bias introduced by prompt wording, the prompt was intentionally framed in neutral, genre-oriented terms, without referencing specific stylistic features, structural patterns, or evaluative criteria (e.g. length, phrasing, or rhetorical strategies). This design aimed to elicit titles based on ChatGPT’s internalised representation of medical RA titling conventions rather than steering outputs toward predetermined forms. The full prompt text was as follows:

“You are an expert academic author. Please generate a concise and informative title for the following medical research abstract. Ensure the title is accurate, follows typical conventions for medical research articles, and is suitable for publication in a high-impact medical journal.”

The abstract of the corresponding article was then provided immediately after the prompt. Only one title was generated per abstract, and no iterative prompting, regeneration, or post-selection was conducted. This one-shot prompting approach was adopted to avoid introducing researcher bias through selective choice among multiple outputs.

Analytical framework

Based on analytical frameworks identified in the literature, the two corpora of medical RA titles were examined in terms of length, form, syntactic structure, and content focus. To analyse titles length, we used Microsoft Word to count the number of words in each title, considering abbreviations, acronyms, and hyphenated words as single words (see Table 2).

Table 2 Representative RA titles with word counts.

To avoid conflating the number of units in titles with their syntactic structure as observed in previous studies (e.g., Haggan 2004; Soler 2007; Cheng et al. 2012; Morales et al. 2020; Martín and León-Pérez 2024), we followed a top-down approach. First, titles were classified by form into single-unit and multi-unit by counting their units. Single-unit titles consist of only one segment or clause, whereas multi-unit titles are composed of two disparate segments linked by a colon (see Table 3).

Table 3 Representative RA titles form.

Second, single-unit and multi-unit titles were each examined separately to identify their syntactic structures (see Table 4). Single-unit titles were further categorised into nominal and verbal constructions. Nominal titles consist mainly of one or more nouns with optional modifiers. Verbal titles begin with a verb in the -ing form followed by objects or modifiers. Multi-unit titles, on the other hand, were further divided into compound constructions consisting of nominal-nominal and verbal-nominal constructions.

Table 4 Representative RA titles syntactic structure.

Regarding the analysis of titles content focus, we followed a typology framework developed by Goodman et al. (2001). Based on this framework, titles were classified into six types: topic only, methods, results, dataset, methods-results, and methods-dataset. Descriptions and examples of each type are provided in Table 5.

Table 5 Framework of content focus in RA titles.

Coding and analysis

The coding and classification of the data were conducted manually by the two researchers. Prior to any discussion, both researchers coded all 600 titles independently using a predefined scheme of RA titles form, syntactic form, and content focus. Inter-rater reliability was assessed using Cohen’s Kappa for each coding level across the full dataset. Almost perfect agreement was obtained for form (κ = 1.00), with no disagreements recorded across the 600 RA titles. For syntactic form, nine cases of disagreement were identified, yielding a κ value of 0.94. This high agreement for form and syntactic structure is attributable to the discrete and mutually exclusive nature of their categories. For content focus, 36 disagreements were observed, corresponding to a κ value of 0.82. Following the reliability assessment, all disagreements were revisited and resolved through discussion until full consensus was reached. The consensus coding was subsequently used for all quantitative and qualitative analyses. This procedure ensured both the reliability of the coding scheme and the analytical consistency of the final dataset.

Statistical analyses were performed using IBM SPSS Statistics (Version 27). An independent-samples t-test was used to examine differences in mean title length between human-authored and ChatGPT-generated titles. In addition, chi-square tests were conducted to compare the frequency and distributions of title form, syntactic form, and content focus across the two corpora.

Results and discussion

Title length

Table 6 presents the distribution of title length across the two corpora. In total, 11,931 title words were analysed, with 6035 words in human-authored titles and 5895 words in ChatGPT-generated titles. The mean title length was 20.1 words (SD = 5.68) for the human corpus and 19.6 words (SD = 3.31) for the ChatGPT corpus. An independent samples t-test, assuming unequal variances, indicated that this difference was not statistically significant, t(480.9) = 1.23, p = 0.219. The mean difference of 0.47 words had a 95% confidence interval of −0.28 to 1.21. The corresponding Cohen’s d was 0.01, with a 95% confidence interval of −0.06 to 0.26, suggesting a negligible effect size. These results indicate that human-authored and ChatGPT-generated titles are broadly comparable in length.

Table 6 Descriptive statistics for title length.

The similarity between human-authored and ChatGPT-generated medical RA titles in terms of length is noteworthy, particularly given previous research reporting diachronic trends towards longer, more informative titles in the medical sciences (Jiang and Hyland 2023; Martín and León-Pérez 2024). ChatGPT’s close alignment with human authors in this regard indicates that the model has internalised prevailing length-related conventions, reflecting disciplinary expectations for titles that are sufficiently extended to communicate key research aims, methods, and findings. Simultaneously, the absence of statistically significant differences in titles length between the human corpus and the ChatGPT corpus suggests that ChatGPT does not extend beyond genre-specific norms but instead reflects them in ways that may contribute to their reinforcement. At first glance, this alignment indicates that AI can serve as a reliable tool for replicating normative length patterns, which may support efficiency in writing and adherence to journal standards.

However, this convergence has broader rhetorical and disciplinary implications. By consistently producing titles that conform to dominant length conventions, ChatGPT may inadvertently reinforce homogenization within the field. Over time, repeated reliance on AI-assisted titling could reduce rhetorical variations, diminishing opportunities for creativity or nuanced differentiation among research studies. In particular, the tension between informativeness and conciseness, a key aspect of effective RA titles, may be subtly constrained if AI outputs encourage formulaic lengths rather than prompting authors to tailor titles to specific research contributions or audiences.

Also, the alignment raises questions about rhetorical stagnation. While longer, informative titles enhance clarity and discoverability, they may also promote over-inflation of methodological detail at the expense of stylistic diversity or interpretive nuance. In extreme cases, this could lead to a proliferation of titles that are technically correct but stylistically uniform, limiting the communicative diversity that underpins dynamic scholarly discourse.

From a pedagogical and EAP perspective, these findings underscore the importance of cultivating GAI literacy. Students and early-career researchers should be encouraged to critically assess AI-generated titles, reflecting on whether length choices optimally serve both disciplinary conventions and rhetorical aims. Educators might emphasise strategies such as iterative prompting, manual refinement, and comparative evaluation of human and AI-generated titles to preserve variation and foster strategic decision-making. In this way, ChatGPT can function not as a replacement for human judgment but as a collaborative tool that supports awareness of genre norms while promoting reflective, context-sensitive authorship.

Title form

As shown in Table 7, both corpora favored multi-unit titles, although their distributions varied significantly. ChatGPT produced a higher proportion of multi-unit titles (275, 91.7%) compared to human authors (249, 83%), whereas human authors employed a greater proportion of single-unit titles (51, 17%) than ChatGPT (25, 8.3%). A Pearson chi-square test of independence indicated that this difference was statistically significant, χ²(1, N = 600) = 10.19, p = 0.001. All expected cell counts exceeded 5, confirming the validity of the chi-square test; the minimum expected count was 38. The corresponding effect size, Cramér’s V = 0.13, 95% CI [0.049, 0.208], suggests a small but meaningful association between title type (ChatGPT-generated vs. human-authored) and title form. These results indicate that ChatGPT exhibits a stronger tendency towards multi-unit constructions compared to human authors.

Table 7 Distribution of title forms.

ChatGPT’s marked preference for multi-unit titles aligns with previous research highlighting the trend towards greater informativeness in medical RA titles. Heßler and Ziegler (2023), for instance, found multi-unit titles in around 98% of articles in The BMJ and The Lancet. Similarly, Kerans et al. (2020) reported that The Lancet featured no single-unit titles, with 93.5% of their medical corpus comprising multi-unit constructions. These patterns reflect the generic conventions of scholarly writing, a tendency noted earlier by Dillon (1981) and Perry (1985), who associate “titular clonicity” (or multi-unit titles) with both scholarly productivity and rhetorical distinctiveness. In medical RAs, multi-unit titles balance informativeness with reader engagement: one segment communicates core content, while the other attracts attention (Sword 2012; Eva 2013). Hyland and Zou (2022) similarly note that such structures increase informational density by elaboration or specification, allowing authors to foreground multiple research aspects and differentiate their work from related studies.

ChatGPT’s consistent production of multi-unit titles likely reflects the training data it has been exposed to, which over-represents such forms. While this demonstrates its capacity to internalise disciplinary norms, it also raises critical concerns. By systematically favoring elaborate, multi-unit constructions, ChatGPT may reinforce dominant conventions and homogenise title forms, potentially limiting rhetorical diversity. By contrast, human authors retain some flexibility, sometimes producing concise, single-unit titles that prioritise readability, memorability, or rhetorical impact (Hartley 2005; Paiva et al. 2012; Hyland and Zou 2022). Over-reliance on AI-generated titles could therefore lead to rhetorical stagnation, where creativity, conciseness, or alternative structuring strategies are underutilised.

These findings also have implications for pedagogy and GAI literacy. Rather than simply adopting AI outputs, researchers and students should critically evaluate the rhetorical trade-offs inherent in multi-unit versus single-unit forms. Educators can guide learners to consider whether AI-generated titles appropriately balance informativeness with engagement, whether they risk overloading methodological detail, and whether the choices serve the communicative and disciplinary purpose of the study. In this way, ChatGPT functions as a supportive tool that reinforces genre conventions while requiring deliberate human oversight to preserve stylistic variation, rhetorical nuance, and disciplinary diversity.

Title syntactic structure

Single-unit titles

As shown in Table 8, single-unit titles were realised only as noun phrases in both corpora. This uniformity is noteworthy given the range of syntactic alternatives available for choice in formulating RA titles, including verbal, full-sentence, and interrogative constructions identified in previous studies (Haggan 2004; Soler 2007).

Table 8 Distribution of single-unit title syntactic structures.

The predominance of nominal constructions in single-unit titles across both corpora echoes previous research findings. Busch-Lauer (2000), for instance, found that approximately 95% of single-unit titles in medical English RAs were nominals. Similarly, Ball (2009) reported a general trend towards nominal titling practices in medical RA titles. This pattern has also been observed across a range of other disciplines (e.g., Perdomo and Morales 2024; Jiang and Jiang 2023; Diao 2021; Morales et al. 2020; Soler 2007; Haggan 2004), suggesting that nominal constructions represent a conventional and widely adopted feature of RA titles. As Wang et al. (2024) observe, the prevalence of nominal titles can be attributed to their capacity to economically condense complex information through modifiers while clearly delineating the focus of the research.

ChatGPT’s replication of nominal constructions demonstrates its capacity to internalise conventional syntactic patterns. While this supports clarity and the efficient communication of complex research content, it also highlights a potential constraint of AI-assisted titling; the reliance on nominals may limit exploration of alternative syntactic forms, such as verbal phrases or interrogatives, which human authors occasionally employ to add emphasis, narrative variation, or rhetorical nuance.

From a pedagogical perspective, these findings suggest that users of GAI should critically assess syntactic choices. In EAP contexts, learners can use ChatGPT outputs as a baseline, refining or diversifying constructions to achieve desired rhetorical effects, maintain reader engagement, and avoid formulaic expression. In this way, AI can assist in following disciplinary conventions while leaving room for deliberate human syntactic decisions.

Multi-unit titles

As in the case of single-unit titles, both human authors and ChatGPT overwhelmingly favoured nominal constructions when producing multi-unit titles (see Table 9) with the two corpora displaying interestingly similar distributions. Specifically, nominal-nominal constructions accounted for 96% of ChatGPT titles and 95.6% of human titles, while verbal–nominal constructions comprised 4% and 4.4% of the respective corpora. A chi-square test of independence confirmed that these distributions did not differ significantly, χ²(1, N = 524) = 0.057, p = 0.812. All expected cell counts exceeded 5, confirming the validity of the chi-square test; the minimum expected count was 10.45. The corresponding effect size, Cramér’s V = 0.010, 95% CI [0.002, 0.109], indicates a negligible association between title type and syntactic structure.

Table 9 Distribution of multi-unit title syntactic structures.

These findings are consistent with previous research demonstrating the predominance of nominal constructions in RA titles in medicine and across disciplines (Perdomo and Morales 2024; Diao 2021; Morales et al. 2020; Jalilifar 2010; Gesuato 2008; Busch-Lauer 2000). In a comparable study of Turkish RAs, Demir (2023) reported that nominal-nominal constructions accounted for more than 85% of titles in medicine and 100% of titles in engineering. This suggests a global trend towards the use of nominal constructions in both single-unit and multi-unit RA titles, irrespective of culture and discipline.

The dominance of nominal-nominal forms reflects their communicative significance in academic discourse, as they enable authors to maximise informativeness while maintaining structural economy (Hyland and Zou 2022; Wang et al. 2024). ChatGPT close mirroring of this pattern suggests a strong attunement to medical titling conventions, indicating that ChatGPT has internalised not only surface-level length conventions but also deeper syntactic preferences. By contrast, verbal-nominal constructions appeared only marginally in both corpora, a finding that aligns with Busch-Lauer’s (2000) observation that medical RA titles typically avoid stylistic variation in favour of unemotional and information-dense formulations. ChatGPT’s close replication of this tendency with near-identical proportions to human authors suggests alignment with prevailing genre practices.

Overall, the parallel distributions observed across the two corpora point to ChatGPT close alignment with human syntactic preferences in multi-unit titles. Yet, this convergence invites reflection on broader implications. While this alignment demonstrates ChatGPT’s utility in producing convention-compliant titles, it also highlights potential risks for disciplinary discourse. Multi-unit nominal constructions, by packing extensive methodological and content information, may inadvertently encourage over-inflation of detail or reduce the prominence of interpretive or conceptual framing. In practice, this could contribute to titles that are formally correct but rhetorically dense, potentially limiting readability or nuanced signaling of study aims.

Title content focus

Analysis of title content focus (see Table 10) reveals significant differences between human-authored and ChatGPT-generated titles, χ²(5, N = 600) = 126.54, p = 0.001. However, two cells (16.7%) had expected counts below the recommended threshold of 5 (minimum expected count = 0.50), indicating a violation of the chi-square test assumptions. To address this, bootstrap methods were employed to provide robust estimates. The effect size, measured by Cramér’s V, was 0.459 (95% CI [0.424, 0.495]), indicating a strong association between title type and content focus despite the assumption violation.

Table 10 Distribution of title content focus.

Specifically, both human authors and ChatGPT markedly favored methods titles, accounting for more than three-quarters of each corpus (76.7% and 81.3%, respectively). This finding is consistent with medical RA titling conventions identified in previous studies. According to Busch-Lauer (2000) and Goodman et al. (2001), medical RA titles are expected to clearly convey the subject matter of the RA while providing sufficient details about study design and methods. Likewise, Siegel et al. (2006) caution that topic-only titles may limit readers’ ability to discern the novelty of the study whereas informative titles that include methodological details enable readers to assess more quickly whether the RA is relevant to their interests. Therefore, by foregrounding methods-oriented forms, both human and ChatGPT-generated titles conform to medical RA titling conventions, reflecting the epistemological value placed on methodological transparency in medical research reporting (Goodman et al. 2001; Siegel et al. 2006).

Nevertheless, closer examination reveals nuanced divergences that carry important rhetorical and epistemic implications. Specifically, human authors produced proportionally more topic-only titles (16.3% vs. 7.3%) and more titles combining methods with datasets (4% vs. 2%). These choices suggest deliberate rhetorical strategies, where human authors balance the need for methodological specificity with concise topical framing or highlight of key data resources. Such variation reflects a human capacity to negotiate disciplinary conventions, authorial voice, and journal-specific stylistic preferences, enabling subtle signaling of novelty, emphasis, or audience orientation.

ChatGPT, by contrast, generated relatively more dataset-focused titles (5.7% vs. 2.7%), more methods-results combinations (3.3% vs. 0.3%), and one explicit results title, which was absent from the human corpus. This pattern suggests that the model tends to prioritise empirical detail and outcome specificity, likely influenced by its training on large-scale medical corpora, where methodological transparency and empirical reporting dominate. While this tendency can enhance informativeness and precision, it also introduces potential risks by systematically amplifying dominant content patterns, ChatGPT may inadvertently reinforce homogenised rhetorical structures, limit alternative framing strategies, and contribute to a subtle narrowing of stylistic and epistemic diversity within the discipline.

The presence of methods-results combinations and explicit results-reporting titles, albeit minimal (3.6%), is particularly noteworthy. Previous research findings indicate that such constructions are relatively uncommon but can effectively signal novelty or highlight key findings (Aronson 2009). ChatGPT’s occasional adoption of these forms raises questions about whether the model is capable of productive rhetorical expansion or whether it reflects overgeneralization from its training data, privileging empirically dense constructions at the expense of concise or interpretive title formats.

Overall, the content-focus analysis illustrates both the capabilities and limitations of ChatGPT as a scholarly writing tool. While ChatGPT demonstrates remarkable alignment with disciplinary norms, its outputs may contribute to the standardization of title content, privileging methodological transparency over conceptual framing, and potentially constraining reader engagement or interpretive flexibility. In EAP and pedagogical contexts, this highlights the need for advanced GAI literacy, which entails not only awareness of potential biases but also active intervention to ensure rhetorical diversity. Learners and authors should critically evaluate whether AI-generated titles maintain a balance between informational density, conceptual emphasis, and readability, selectively modifying content focus to preserve clarity, novelty, and disciplinary appropriateness.

Conclusion

This study compares human-authored RA titles in high-impact medical journals to those generated by ChatGPT across length, form, syntactic structure, and content focus. Overall, the results reveal a high degree of convergence between human authors and ChatGPT in the construction of titles. Median title length is broadly comparable, with no statistically significant differences between human- and ChatGPT-generated titles. Both sets of titles show a strong preference for multi-unit forms, within which nominal–nominal constructions dominate. In terms of information content, methods titles account for most titles in both corpora, although ChatGPT generates proportionally more results-focused titles than human authors. These similarities suggest that ChatGPT can reproduce established medical titling conventions with considerable accuracy, likely reflecting its exposure to large-scale medical corpora and internalised genre regularities.

However, this apparent alignment warrants critical reflection. Across all levels of analysis, ChatGPT shows a strong tendency to adhere to prevailing norms which, if uncritically adopted, may reinforce dominant conventions and contribute to homogenization of disciplinary discourse. Its consistent preference for multi-unit, nominal, and methods-focused titles highlights the risk of rhetorical stagnation, potentially limiting stylistic diversity and reducing opportunities for concision, creative phrasing, or alternative rhetorical strategies. Moreover, the relative emphasis on methodological details and explicit results, while informative, may foreground particular reporting practices embedded in the model’s training data.

From an EAP perspective, these findings underscore the need for more advanced forms of GAI literacy. Instructors and authors should not only maintain critical awareness of AI-generated texts but also actively evaluate whether such outputs align with communicative goals, audience expectations, and stylistic considerations. Pedagogical guidance should therefore include discussion of alternative titling strategies, the implications of rigid adherence to conventions, and the ways AI outputs may subtly shape disciplinary writing norms. By combining AI assistance with informed human judgment, writers can leverage ChatGPT’s capacity to model conventions while mitigating the risks of formulaic or homogenised output.

In sum, ChatGPT demonstrates a robust ability to mirror human practices in medical RA titles, yet this strength also presents a tension: close alignment with genre norms may inadvertently constrain rhetorical variation and reinforce entrenched patterns. Responsible use in both academic practice and pedagogy therefore requires careful human oversight, critical engagement, and explicit instruction in balancing conformity with innovation.

While this study offers important insights into the similarities and differences between human-authored and ChatGPT-generated medical RA titles, several limitations should be acknowledged. First, the analysis is restricted to a single discipline of titles, which may not capture variations across disciplines with different rhetorical and stylistic conventions. Titles in the humanities, for example, often privilege creativity and metaphor (e.g., Haggan 2004), whereas those in the sciences tend to emphasise precision and methodology (e.g., Busch-Lauer 2000). Future studies should expand the scope of analysis to encompass a broader disciplinary range to assess whether ChatGPT adapts equally well across knowledge domains.

Second, ChatGPT-generated titles were produced using article abstracts as input rather than full manuscripts and through a single, non-iterative prompting procedure. While abstracts encapsulate the core aims, methods, and findings of a study, they do not always reflect the full scope, nuance, or rhetorical positioning of the complete manuscript. Moreover, this one-shot prompting approach may not fully capture how researchers interact with GAI tools in authentic writing contexts, which often involve iterative prompting, revision, and human intervention. Future research could address this limitation by examining title generation based on full-text inputs and by exploring how different prompting strategies and revision cycles shape the rhetorical features of GAI-generated titles.

Third, the study focuses on a single GAI system (ChatGPT) and does not compare outputs across different models or versions. Given the rapid development of LLMs and the documented variability across systems and updates, the findings should be interpreted as model-specific rather than generalisable to all GAI tools or model iterations. Comparative studies examining multiple models or successive versions of the same system would offer a more comprehensive understanding of GAI performance in academic titling.

Fourth, the study did not incorporate blind human evaluations of title naturalness, clarity, or perceived appropriateness. While genre- and corpus-based methods allow for systematic comparisons of formal and rhetorical features, complementary perceptual evaluations by disciplinary experts could provide additional insights into how GAI-generated titles are perceived by human readers.

Finally, potential domain-specific biases inherent in the training data of LLMs cannot be ruled out. ChatGPT’s exposure to large volumes of biomedical and medical texts may influence its reproduction of dominant disciplinary conventions while marginalising less common or emerging titling practices. Such biases may shape the patterns observed in the generated titles and warrant further investigation across disciplines with differing writing conventions.