Introduction

In the public and academic debate about gender-inclusive German, a recurring argument against the use of gender-inclusive forms is that they make “a text cumbersome” (Schneider 2020),Footnote 1 that the language “becomes even more complicated and off-putting for foreigners considering to learn German” (Rock et al. 2021)Footnote 2 and that gender-inclusive language makes “texts unreadable and longer”Footnote 3 (Web editorial staff of the LpB BW 2023). Also, the linguistic effort that is necessary and that makes texts “long, monotonous and gender-fixated”Footnote 4 (Eisenberg 2022) is seen as a disadvantage of gender-inclusive language, both in the public debate and among some linguists. However, empirical evidence on the readability of gender-inclusive texts in German shows that gender-inclusive language does not reduce comprehensibility (Braun et al. 2007, Blake and Klimmt 2010, Friedrich and Heise 2019). A rapid habituation effect for gender-inclusive forms has also been shown for French (Gygax and Gesto 2007): The study suggests that reading is temporarily slowed during the initial exposure to inclusive forms. However, with subsequent encounters, the reading speed becomes comparable to that of non-inclusive texts. Speyer and Schleef (2019) show a similar effect for the use of singular they, comparing native speakers and learners of English. Despite these empirical insights, criticism of and resistance to gender-inclusive language continues to be widespread (see section Gender-inclusive language and personal nouns in German). With our corpus-based annotation study, we attempt to empirically assess this ‘challenge’ of gender-inclusive language in German. To the best of our knowledge, there are currently no statistics regarding the extent to which gender-inclusive language would impact German language material if non-inclusive texts were to be re-written. More generally, there is a lack of data on the proportion of text that actually refers to human beings, i.e. that could potentially be subject to gender-inclusive language. There are only a few studies dealing with the quantitative-empirical analysis of personal nouns in written texts from a linguistic perspective, e.g. corpus-based studies targeting linguistic entities that are easy to recognise automatically, such as pronouns, especially in English texts (e.g. Saily et al. 2011, Zeng 2023, cf. also Motschenbacher 2015, 34–35); some studies are enriched with manual analyses (e.g. Baker 2010, Rosola et al. 2023); there is also an increasing interest in the topic in economic disciplines (e.g., for German texts cf. Eugenidis and Lenz 2022). However, even if personal nouns could be detected automatically in the future (cf. Sökefeld et al. 2023 for an initial attempt), the problem of identifying reference would remain unsolved. This is especially relevant for the so-called masculine generic, which is the main focus of gender-inclusive language, and which cannot be distinguished from masculine specifics by mere form (cf., e.g., Schmitz et al. 2023). Manual annotation is therefore necessary for our research question. Our starting points are studies in which German personal nouns are analysed manually and on the basis of small linguistic datasets (e.g. Doleschal 1992, Kusterle 2011, Pettersson 2011). The annotation system for the present study was developed on this basis (cf. section Corpus and source selection). The article is structured as follows: In the section Gender-inclusive language and personal nouns in German, we provide more background information on gender-inclusive language and personal nouns in German. In the Method section, we describe the method of our study, followed by results and discussion in Results & Discussion. We conclude our paper with a brief Conclusion and Outlook.

Gender-inclusive language and personal nouns in German

German is a language with three grammatical genders (masculine, feminine, neuter). There is a mix of semantic and formal regularities to assign grammatical gender to words, but, according to Hellinger and Bußmann, for “approximately 90% of German monosyllabic nouns, gender class membership can be predicted from morphophonological criteria” (2003, 143). Gender assignment of personal nouns, however, requires special attention, as it is often driven by lexical-semantic factors: “The assumption that, in principle, the assignment of a German noun to one of the three gender-classes is arbitrary is unfounded in the field of animate/personal nouns, where explicit relations between grammatical gender and the noun’s lexical specification can be formulated” (Hellinger and Bußmann 2003, 146). We can therefore summarise that the gender assignment of personal nouns often depends on external correlates (Corbett 2013, 1), i.e. the gender of a noun’s referent (McConnell-Ginet 2013, 5).

According to Hellinger and Bußmann (2003, pp. 150–160), when referring to persons in German, we can distinguish between personal nouns that specify referential gender by grammatical, lexical or morphological means:

  1. 1.

    Specification of referential gender by grammatical means: Singular personal nouns in German that are derived from adjectives (like gesund, ‘healthy’) and verbs (studierend, present participle of studieren, ‘to study’; abgeordnet, past participle of abordnen, ‘to delegate’) use the grammatical gender of the articles (e.g. die (f.) Gesunde vs. der (m.) Gesunde) or the adjective inflection (e.g. eine Abgeordnete (f.) vs. ein Abgeordenter (m.)) “to make referential gender explicit or overt” (also called Differentialgenus ‘double gender’; cf. Hellinger & Bußmann 2003 p. 150). Gender specification in these nouns is neutralized in the plural, since articles and other determiners do not vary for grammatical gender in the plural (die Gesunden (m./f.pl.) ‘the healthy’, die Studierenden (m./f.pl.) ‘the students’, die Abgeordneten (m./f.pl.) ‘the delegates’). Some indefinite pronouns can also have this kind of double gender, e.g. keine/jede (f.) vs. keiner/jeder (m.) or keines/jedes (n.; ‘no’/‘each, every’); some others are grammatically invariable and always masculine (so-called generic pronouns like jemand ‘somebody’ or niemand ‘nobody’).

  2. 2.

    Specification of referential gender by lexical means: Gender-specification by lexical means is often realised in compounds that denote occupations and functions, containing the second elements -mann (‘-man’) or -frau (‘-woman’) (like Feuerwehrmann, Feuerwehrfrau, ‘firefighter’). Additionally, there are nouns where referential gender is encoded in the lexical meaning and usually results in lexical pairs, e.g. die Tante ‘aunt’ vs. der Onkel ‘uncle’; die Tochter ‘daughter’ vs. der Sohn ‘son’. In this category, grammatical gender is congruent with extra-linguistic gender. In the following, we call these nouns lexical gender nouns.

  3. 3.

    Specification of referential gender by morphological means: The most prominent way to specify referential gender in German is to use suffixes that make the noun gender-specific. This function is mostly carried out by the feminizing suffix -in which can be attached to most masculine derivation bases (e.g. Arbeiter/Arbeiterin, ‘male/female worker’; Maler/Malerin, ‘male/female painter’) (for more marginal feminizing suffixes, cf. Doleschal 1992, 27–29, Hellinger and Bußmann 2003, 152–153; compare the superficially similar, but functionally different suffix –ess in English, Stefanowitsch and Middeke 2023). There is only a small set of feminine bases that is used to derive masculine terms from feminine ones in the human domain: Braut/Bräutigam (‘bride/bridegroom’) Witwe/Witwer (‘widow/widower’) and Hexe/Hexer (‘witch/witcher’).

As an option to neutralize referential gender in German (besides the use of plural forms of nominalized adjectives/participles), we can use collectives (e.g. society, group, family, etc.) and epicene nouns, i.e. personal nouns with a fixed grammatical gender that can refer to any extra-linguistic gender. These nouns occur in all three grammatical genders (e.g. die Person, f. ‘person’, der Mensch, m. ‘human being’; das Kind, n. ‘child’) (Corbett 1991, 67, cf. Klein 2022).

Resulting from these gender differentiations, German has various kinds of pair forms when denoting humans: a) double gender pairs (e.g. der Kranke/die Kranke, ‘sick person’); b) (asymmetrical) lexical pairs (Krankenschwester/Krankenpfleger ‘nurse/male nurse’; Vater/Mutter ‘father/mother’); c) masculine forms with feminine derivations (e.g. der Arzt/die Ärztin, ‘male/female doctor’). All of these are semantic minimal pairs, i.e. they have the opposing semantic features +male/-female and +female/-male (Diewald 2018, 290–293). Within these pairs, the masculine form usually has two functions: first, as a masculine specific and, second, as a so-called ‘masculine generic’. The term denotes the use of the masculine form to refer to an individual or a group of people whose gender is unknown, irrelevant, or ignored (like Wissenschaftler, ‘scientists’, for a group of scientists). Masculine generics are a common phenomenon in grammatical gender languages when referring to humans of mixed or indefinite sex (Corbett 2013). Parallel to that, there can be feminine generics in German (e.g., referring to all scientists with the feminine term Wissenschaftlerinnen), but these are very rare compared to masculine generics and are often used consciously as a means of gender-inclusive language, e.g. in recent years in the newspaper Die Zeit (Dülffer 2018).

Whether a masculine personal noun refers specifically to a male person or generically to a group or an individual of unknown gender cannot be decided based on the surface form. On the one hand, the masculine and feminine forms of a personal noun like Wissenschaftler (‘scientist’) may be used as semantic minimal pairs to refer to male vs. female individuals. Consider the following context: Das Podium bestand aus drei Wissenschaftlern und einer Wissenschaftlerin. (‘The panel consisted of three male scientists and one female scientist’). In this case, the linguistic category ‘grammatical gender’ reflects the social gender of the extra-linguistic referents (i.e. the masculine form maps onto male referents, the feminine form maps onto female referents). On the other hand, the grammatically masculine terms are also used to refer to mixed groups of people, to people of unknown gender, or in contexts where gender is presumably irrelevant, e.g. in contexts like die Wissenschaftler sind sich bislang nicht einig (‘the scientists [m.pl.] do not yet agree’). Here, grammatical gender is assumed to be a neutral category, i.e. not carrying information about referential gender. The reference of the superficially identical masculine lexemes is only resolved in context, which is why the question of whether a masculine form is used specifically (i.e. to designate individual male referents) or generically (i.e. for indefinite referents or mixed groups) cannot yet be detected automatically (Sökefeld et al. 2023, 38) and must be examined individually for each case (Elmiger et al. 2017, 64).

The use of masculine generics to denote all genders is subject to controversial societal and academic debates (Pusch 1984, Müller-Spitzer 2022a, 2022b, Simon 2022, Trutkowski and Weiß 2023). Proponents of gender-inclusive language usually do not accept it as a gender-neutral way of person reference (Hellinger and Bußmann 2003, 160–161, e.g. Acke 2019, 308). Opponents of new forms of gender-inclusive language, by contrast, consider the masculine generic to be gender-neutral ’by default’ (sometimes based on Becker’s assumption of conversational implicatures, cf. Becker 2008; or based on Jakobson’s concept of markedness, cf. Eisenberg 2020, Meineke 2023, or on selective historical data, cf. Trutkowski and Weiß 2023). However, many psycho- and neurolinguistic studies find that the so-called masculine generic is not always understood neutrally but rather activates a male bias (e.g., Gygax et al. 2008, Körner et al. 2022, Glim et al. 2023, Zacharski and Ferstl 2023), i.e. “does not represent men and women equally well” (Glim et al. 2023, 2). These effects are, at least in part, due to the grammatical properties of German, in which the masculine form fulfils the double-function outlined above (Garnham et al. 2012). In addition, gender stereotypes and true gender ratios in the respective societal groups modulate these effects (Gygax et al. 2016).

In 2018, the German Personal Status Act was amended to introduce a third gender option (called divers) for intersexual individuals. These developments have made the question of how to best address people beyond the binary spectrum more urgent (Kaplan 2022; for research on this topic in other languages cf., e.g., Decock et al. 2023, Thorne et al. 2023). An option already well established in the language system is to use neutralizations such as epicene nouns, or derivates of adjectives and verbs in the plural. However, besides established feminization strategies (pair forms like Lehrerinnen und Lehrer, ‘female and male teachers’), so-called gender symbols, which are inserted between the masculine base and the feminine suffix, came into use. They are intended to encompass all gender identities (e.g. Lehrer*innen, Lehrer:innen, ‘teachers of all genders’; cf. Friedrich et al. 2021, Körner et al. 2022), which a recent psycholinguistic study suggests to be actually the case (Zacharski and Ferstl 2023).Footnote 5 The symbols work particularly well in the plural because dependent elements are not marked for gender in the plural and therefore remain unchanged, and because the morphological combination of a masculine base and the feminine derivation suffix is easy to split with a symbol in the plural. Some qualitative studies have already found tendencies for fewer masculine generics and more gender-inclusive forms (Elmiger et al. 2017, cf. Adler and Plewnia 2019, Krome 2020). Quantitative studies on the use of these symbols are still scarce (e.g. Sökefeld 2021, Waldendorf 2023).

In the wake of this debate, many public bodies, large companies and other institutions are now issuing guidelines on gender-inclusive language (cf. links to guidelines of German-speaking cities; Müller-Spitzer et al., 2023, 5). However, this new awareness of gender-inclusive language has been accompanied by strong counter-movements that continue to fuel the debate and challenge the ideas behind gender-inclusive language in general (for discussions about gender-inclusive Spanish, cf. Banegas and López 2021a). Opponents often argue that the gender symbols are not part of the German language/spelling system and thus should be regarded as ‘mistakes’ (e.g. Eisenberg 2022). It is also claimed that they distract from the essential content of a text, or that they make texts harder to read, especially for children, L2 learners, or the visually impaired (e.g. Kalverkämper 1979, Rothmund and Christmann 2002, von Münch 2023). However, we argue that such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. This is the point of departure for our main research question. By analysing, on the basis of a large corpus, how much of a text would potentially be affected by gender-inclusive language in German, we contribute quantitative data to assess the actual relevance (measured as the proportion of affected textual material) of these claimed effects.

Method

Corpus and source selection

Our study is based on the German Reference Corpus (DeReKo; Kupietz et al. 2010, 2018), from which a sample of texts was selected (cf. section Sampling). These were taken from four sources: the DPA (Deutsche Presseagentur ‘German Press Agency’) and the three magazines Brigitte, Zeit Wissen, and Psychologie Heute. The DPA texts are the central resource for the study. There are several reasons for this. First, DPA is the biggest news agency in Germany, and its reports are distributed to almost all major radio stations and daily newspapers (Pürer and Raabe 2007, 29, 327). Its texts are often re-printed verbatim or only with slight variations. Second, DPA is obliged to be impartial and independent from political parties, worldviews, economic and financial groups, and governments,Footnote 6 meaning that its reports can be considered as objective as possible. Third, DPA only recently announced its decision to use more gender-neutral language from now on,Footnote 7 meaning that DPA texts from before 2021 are not already (consciously) gender-inclusive and thus serve as a good basis to investigate non-gender-inclusive language. Therefore, we only included texts from the years 2006–2020 to tackle our research questions. Fourth, our aim was to annotate whole texts, as selecting only excerpts could have undesirable biasing effects, e.g. a masculine form could be interpreted as generic, although earlier/later in the text a specific referent is introduced. DPA press releases have an average length of 339 tokens in DeReKo (cf. Table 1) and are therefore relatively short, making them well suited for whole-text annotations. To check whether similar patterns would be found in entirely different media outlets, we created a control corpus containing longer texts. The magazines Brigitte, Zeit Wissen, and Psychologie Heute were selected because they have a more general societal outlook and/or cover popular science topics. All three are issued by different publishers, minimizing the influence of publisher-specific guidelines.Footnote 8

Table 1 5th percentile, median value (50th percentile), mean value and 95th percentile for word (token) counts in the four sources. Where applicable, figures are rounded to the nearest integer.

Sampling

In the overall corpus (DeReKo), there are 2,322,095 documents available for all four sources. The sampling process was based on the number of words (tokens) per document. For each source, we calculated the 5th and 95th percentile of token counts. For DPA, the interval is [I = 87, 837], i.e. 90% of all DPA documents are between 87 and 837 words long and were selected for the sampling procedure. The values for the other sources, as well as median (50th percentile) and mean values are given in Table 1. The upper bound for the magazine sources is generally higher than for the DPA documents, i.e. there are more longer documents in the magazine sources. This is also reflected in the median and mean token counts for the four sources.

We randomly sampled a fixed number of documents that fall into the inner 90% of token counts (between the 5th and 95th percentile). For DPA, we sampled 190 documents, and for the magazine sources 40 documents each, i.e. we had a total of 310 sampled documents. After annotation, 261 texts remain in the corpus (for details, see the section Annotation process). Their token counts are summarized in Table 1 under ‘Annotated Sample’.

Annotation process

The aim of the manual annotation conducted for this study was to find all tokens that would have to be changed if the text was reformulated in a gender-inclusive way. Our annotations focus on expressions that refer to natural persons, i.e. usually heads of noun phrases (NPs) in the form of nouns or pronouns (cf. Stede 2016, 55). Accordingly, we follow an action-theoretical concept of reference based on the interpretation of the target item in the given context (Pettersson 2011, 57). In addition to the head of the noun phrase, dependent elements in the NPs are annotated. For that, we decided to apply a strict bottom-up approach, i.e. to identify the head first and then select the elements that depend on it (especially articles and attributive adjectives, cf. Table 2, as these can theoretically be affected by gender-inclusive language, as opposed to genitive constructions or prepositional phrases).

Table 2 Illustration of the necessity to use a gender-inclusive form (1: original sentence, in italics the token that has to be changed in case of using gender-inclusive language; 2: possible reformulation using pair forms; 3: possible reformulation using gender asterisk).

The manual that served as the basis for the annotations was developed over the course of several months. Modifications were implemented after each training round, when we could see difficulties and uncertainties regarding the application of the manual. Building upon the insights from these pre-tests, the annotation scheme underwent refinement and expansion. The more elaborate annotation scheme was then used in its final version for the present study, which was conducted from December 2022 to March 2023. Two research assistants (in the following called annotators A and B) annotated the texts simultaneously. 261 of the 310 sampled documents were annotated by both annotators,Footnote 9 yielding an overall inter-annotator agreement of 77.89%. The version of the annotation scheme used for this study consists of eleven categories with various sublayers. The decision tree in Fig. 1 illustrates the dependencies between them. Further information about the decision tree, the layers, the annotation procedure, and the inter-annotator agreement can be found in the supplementary material.Footnote 10

Fig. 1: Annotation scheme.
figure 1

Decision tree for the annotation software and process. [This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

As ‘necessity to use a gender-inclusive form’ is the central category for our study, it is described in more detail here rather than exclusively in the supplementary material. Within the annotation procedure, it is necessary to indicate for each annotated token whether it would need to be replaced by another form in order to make the text gender-inclusive. For personal nouns, this is usually the case if the form is annotated as a masculine or feminine generic. Regarding pronouns, this is only the case for generically referring personal pronouns.Footnote 11 Dependent elements need to be adjusted only if the head of the NP would be subject to change – however, dependent elements need to be thoroughly checked to determine whether they would actually change form in case of an adjustment to gender-inclusive language (e.g. most attributive adjectives are identical for feminine and masculine gender: der kranke Patient/die kranke Patientin; ‘the sick [male/female] patient’). Example (a) illustrates this further:

  1. a.

    Während die einst potenten Sozialdemokraten im Bund unter ihrem Parteichef Kurt Beck in der Krise stecken, träumen die traditionell schwachen bayerischen Genossen von der Machtübernahme im Freistaat. (‘While the once-powerful Social Democrats at the Federal level are in crisis under their leader Kurt Beck, their traditionally weak Bavarian comrades dream of taking power in the Free State [Bavaria].’ (DPA08_JUL03207)

In this sentence, only two nouns are annotated as having a ‘necessity to use gender-inclusive form’ (printed in bold). The dependent elements in the noun phrase would not need to be changed, as can be seen in Table 2: Even if the heads were changed to gender-inclusive forms, the dependent elements would remain the same. The excerpt also contains a specific male person (Kurt Beck) and a masculine role noun, Parteichef (‘party leader’), referring to him. Accordingly, Parteichef is annotated as a personal noun with specific male reference.

In what follows, we will report analyses based on the 261 documents that were annotated by both annotators.Footnote 12 Figure 2 gives an overview of the token count distributions of candidate texts (i.e. all texts after selecting the inner 90% of token counts for each source) and the 261 texts on which we base our analyses. Comparing these distributions, we can conclude that the texts reported here provide a good reflection of the underlying token count distributions for DPA, Brigitte, Psychologie Heute, and Zeit Wissen.Footnote 13

Fig. 2: Distribution of token counts.
figure 2

Distribution of token counts (y-axis) for the candidate texts (grey violins), i.e. all texts with a token count in the inner 90% of all texts from this source (x-axis). Data points represent token counts for all 261 texts reported in the remainder of the paper (one point per document). [This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

Results & Discussion

Person reference and linguistic classes

In total, the 261 texts annotated by both annotators comprise 120,626 tokens.Footnote 14 Without punctuation marks, 93,533 tokens remain.Footnote 15 Of these, 11,375 tokens (12.2%) were annotated by at least one annotator as having person reference (i.e., as belonging to the linguistic classes 1–3: personal noun, pronoun, dependent element). The annotators agreed on the linguistic class of 8,840 (A = B; 77.71%) of these tokens; another 675 (5.93%) were annotated by both, but with diverging linguistic classes (A ≠ B); 1,860 tokens (16.35%) were annotated by only one annotator (A v B). This means that the vagueness regarding which token can be considered person reference is roughly 16%. Importantly, this vagueness (or uncertainty) is distributed unevenly across linguistic classes. Dependent elements (LK_3) caused the most insecurities, constituting 58.06% of all tokens that were annotated only once. From what we discussed earlier regarding phrase structure, it is probable that this is due to uncertainties about which elements belong to an NP and therefore have person reference. Hence, an important takeaway for future studies is the necessity to improve the training of research assistants in the domain of phrase-structure grammar. With 27.37%, personal nouns (LK_1) ranked second regarding non-matching annotations. Pronouns were the least problematic, making up 13.82% of vagueness. The remaining proportion of insecurity is attributed to nouns that superficially look like personal nouns but actually refer to objects or institutions or are used metaphorically (e.g. Partner, ‘partner’, to refer to a country). For this study, we only consider the 8,840 tokens with matching annotations to represent reliably the amount of person reference in the annotated texts. Personal nouns are the biggest category with 3,196 tokens (3.42% of all tokens; Sökefeld et al. 2023 find a similar proportion of personal nouns in their automatic detection tests), followed by dependent elements (3,097 tokens; 3.31%) and pronouns (2,547 tokens; 2.72%).

All measures reported so far refer to all documents in the corpus as one large list of tokens. However, in order to accurately assess the relevant proportions of tokens, we have to consider the document level (which was also our level of sampling). We therefore determined the proportions for each document and then calculated overall means, weighted by the number of tokens each document contributes to this mean. For each value of the weighted mean, we report 95% confidence intervals according to a hypergeometric distribution in brackets. This is the appropriate method in this case because the annotated texts were sampled from the candidate texts without replacement.Footnote 16 Figure 3 shows the proportions of tokens with person reference for the DPA and the control (i.e. Brigitte, Psychologie Heute, Zeit Wissen) corpora. While the mean for all DPA documents is 7.99% (7.75%–8.23%), it is significantly higher for the control corpus at 11.06% (10.77%–11.35%). The mean of all sources taken together is 9.45% (9.26%–9.64%).

Fig. 3: Proportion of tokens with person reference in DPA and control corpus.
figure 3

Data points represent documents; the red square indicates the mean, including error bars that symbolize the 95% confidence intervals (sometimes these are fully covered by the square – e.g. in the left boxplot). [This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

Necessity to use gender-inclusive language

The annotation category ‘necessity to use gender-inclusive form’ holds a pivotal role in addressing the primary research question of this study: to what extent would tokens within press texts have to undergo changes due to the adoption of gender-inclusive language? For DPA, the average share of tokens that would be affected by gender-inclusive re-editings is 0.73% (0.66%–0.81%), whereas it is 1.18% (1.09%–1.29%), and therefore significantly higher, for the control corpus (cf. Fig. 4). If we take all sources together, we get a proportion of 0.95% (0.89%–1.01%) that would be affected by gender-inclusive language. Considering only tokens with person reference, an average of 9.13% (8.25%–10.08%) would be affected by gender-inclusive language in DPA and 10.67% (9.82%–11.56%) in the control corpus. Here, the difference between the corpora is not significant. Taking all sources together, an average of 9.99% (9.37%–10.63%) of person references would be subject to gender-inclusive language. We can therefore record three central measures so far: all sources taken together, on average a) 9.45% of tokens are (part of) person references; b) 0.95% of all tokens would be affected by gender-inclusive language; and c) 9.99% of all tokens with person reference would be affected by gender-inclusive language.Footnote 17

Fig. 4: Proportion of tokens with necessity to use gender-inclusive language in DPA and the control corpus.
figure 4

[This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

Figure 5 shows that the largest proportion of affected tokens (tokens that would have to be changed) belongs to the category ‘personal nouns’ (799 tokens overall; 90.08% of the total of 887), i.e. gender-inclusive language would mostly affect nouns. All affected personal nouns are masculine generics, underlining that they are the focus of gender-inclusive language. The average proportion of personal nouns that would be changed by gender-inclusive language across all the documents is 25.00% (23.51%–26.54%). There are six documents in which all personal nouns would be subject to change (i.e. the dots at the 100% margin), but far more documents in which none of the personal nouns would need to be changed (N = 81, dots at the 0% margin). More details on the amount of documents that would be affected by changes can be found in the following section. For the other two linguistic classes, the proportion is only marginal – in most documents, none of the pronouns or dependent elements would be subject to change. This is especially clear for pronouns, where the average proportion is 0.12% (0.02%–0.34%). For dependent elements, it is 2.62% (2.08%–3.24%), with one outlier document in which about 66.00% of dependent elements would be changed if gender-inclusive language were used in the document. Gender-inclusive re-editings would therefore rarely interfere with the grammar of the extended noun phrase.

Fig. 5: Necessity to use gender-inclusive form.
figure 5

Necessity to use gender-inclusive form split by linguistic classes (all documents taken together). [This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

Personal nouns

The category of personal nouns will be discussed in detail for four reasons: Personal nouns are (a) the most frequent linguistic class, (b) the class with the highest proportion of tokens that would be affected by gender-inclusive language, (c) the linguistic class with the most diverse annotation layers (cf. Supplementary Material, Section 1), and (d) the linguistic class most relevant to the study of the linguistic representation of people in texts (e.g. Hellinger and Bußmann 2003, 143).

First, we report the distribution of annotation layers for personal nouns. Taking all sources together, we see a prominence of epicene nouns (27.32%, 25.78–28.90%), closely followed by masculine generics (24.97%, 23.48–26.51%), and masculine specifics (22.93%, 21.49–24.43%). Lexical gender nouns (9.95%, 8.93–11.04%) and feminized forms (8.04%, 7.12–9.04%) are significantly rarer. We find no nouns that were annotated as feminine generics by both annotators. Figure 6 shows that the distribution of layers for personal nouns varies considerably between the two corpora. In DPA, masculine specifics are by far dominant (mean share of 36.47%, 34.10–38.89%), especially compared to the control corpus, where this category only amounts to an average share of 9.48% (8.09–11.02%). For all other categories, it is the other way around. The average shares of epicenes, masculine generics, lexical gender nouns, and feminized forms are always higher in the control corpus. We can deduce that DPA predominantly reports on specific male persons, using masculine forms, whereas the other sources tend to report more unspecifically, i.e. making use of gender-neutral forms (e.g. epicenes) and masculine generics. Referent gender is mostly specified by lexical gender nouns in the control corpus (e.g. Frau ‘woman’, Mann ‘man’), while such forms are infrequent in DPA.

Fig. 6: Types of personal nouns by corpus.
figure 6

Only outlier documents are shown as data points. [This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

Looking at the annotation layer ‘Which referent gender is recognizable from context?’, we see clear differences between DPA and the control corpus (cf. Fig. 7). In DPA, there is a strong male bias. If referent gender is recognizable from context in DPA,Footnote 18 a mean share of 80.37% (77.46–83.05%) of these tokens refer to men. Only an average of 19.01% (16.37%–21.89%) refer to women. This strong male dominance in news reporting is in line with findings of other studies (e.g., Saily et al. 2011, Lansdall-Welfare et al. 2017). In the control corpus, however, the bias disappears: the average share of tokens referring to women (52.87%, 48.56–57.14%) is even slightly higher than for men (45.29%, 41.04–49.59%). Brigitte has the biggest influence here—a mean of 60.54% (54.32–66.51%) of tokens for which referent gender is identifiable refer to women, while only 38.70% (32.76–44.90%) refer to men. In our corpus, no non-binary referents were identified. ‘Group’ reference (i.e. to mixed groups of men and women) is rare (N = 11 in all documents taken together) and not discussed further here. These findings indicate substantial differences in the way different sources include men and women in their reporting, which is most likely due to differences in topics and audiences (cf. e.g., Müller-Spitzer and Rüdiger 2022). However, comparable corpus studies are needed to draw such conclusions on a reliable basis.

Fig. 7: Share of tokens with ‘male/female gender’ or ‘group’.
figure 7

Share of tokens for which ‘male/female gender’ or ‘group’ is deducible from context (total amount: tokens for which referent gender is recognizable from context). Only outlier documents are shown as data points. [This figure is covered by the Creative Commons Attribution 4.0 International License. Reproduced with permission of IDS Mannheim; copyright © IDS Mannheim, all rights reserved.].

Our method can thus also be used to quantify the occurrences of men and women mentioned in press texts. It encompasses all personal nouns and therefore goes beyond the analysis of proper names, which are used by Eugenidis & Lenz (2022), for example, to quantify the proportion of men and women on company websites. This is especially relevant as media outlets increasingly seek to scrutinize gender proportions within their articles. One example is the renowned German weekly magazine Der Spiegel, which conducted an analysis of gender proportions in their own texts (Pauly 2021). However, they pointed out that they could not include personal nouns in their evaluations because they used procedures for named entity recognition (Pauly 2021). Our approach could effectively supplement such automated procedures in grammatical gender languages. Additionally, our annotated dataset could serve as training material for developing automatic processes to detect personal nouns, particularly in terms of distinguishing between generic and specific references. The need to supplement automated processes with in-depth annotations is also highlighted by Sökefeld et al. (2023, 38).

Furthermore, our annotations allow us to analyse the distribution of masculine specifics and masculine generics in more detail and to investigate the embedding of masculine generics in actual language use. Here, we provide a brief insight, focusing on the document level of our data. In total, 116 of the 261 annotated texts (44.44%) contain both masculine generics and specifics. In 55 of these (47.41%), specifics are more common than generics; in 48 documents (41.38%), it is the other way around. Another 22 texts (18.97%) have equal amounts of masculine specifics and generics. In 104 texts (39.85%), we find only one of the forms: 57 (54.81%) have only masculine specifics; 47 contain only masculine generics (45.19%). This means that there are 41 texts (15.71%) in which neither a specific nor a generic masculine is used. In sum, texts with both specifics and generics are most common, followed by texts with only masculine specifics and texts with only masculine generics. Texts without any of these forms are least common. Gender-inclusive re-editings would in sum affect 163 of the 261 annotated documents (62.45%). To put it differently, in more than a third of the documents, nothing would need to be changed if gender-inclusive language was applied.

The prototypical use of the masculine generic is often considered to be found in abstract contexts (cf. Zifonun 2018, 49–50) in which no specific individuals are referred to and in which the semantic category ‘gender’ is presumably neutralised (ex. b). However, our data show that masculine generics can be used in a diverse set of contextual embeddings and with different levels of referentiality. We find, for example, four cases in which a masculine plural refers to a pair consisting of a man and a woman. They are introduced with their names in the text and then collectively referred to with a masculine generic (ex. c). We also find one masculine form with female reference (ex. d) in an enumeration with feminized forms and a lexical gender noun, all referring to the same woman. In many other cases, the masculine generic is used in contexts where a small and specific number of referents (ex. e) are introduced but whose genders are not specified in the rest of the text.

  1. b.

    Die Preisträger genießen an Schulen besonderes Ansehen. (‘(The) Award winners enjoy a special reputation at the school.’) (from Zifonun 2018, 50)

  2. c.

    […] haben die Psychologen Angela Duckworth und Martin Seligman […] (‘[…] the psychologists Angela Duckworth and Martin Seligman have […]’) (PH07_AUG.00032)

  3. d.

    Stylistin und Spielplatzmami, Kinderkutschierer und Großeinkäuferin. (‘Stylist and playground-mummy, children’s coachman and bulk buyer.’) (BRG10_JAN.00047)

  4. e.

    Sieben Umweltaktivisten aus verschiedenen Teilen der Welt […] (‘Seven environmental activists from different parts of the world […]’) (DPA08_APR.08223)

While having the power to level out the importance of gender in such contexts (Zifonun 2018, 50–51), the masculine generic can also be understood to veil referent genders and make women (and other genders) invisible or at least harder to include cognitively (as is suggested by various psycholinguistic studies on the male bias, e.g. Gygax et al. 2008, Körner et al. 2022, Zacharski and Ferstl 2023). The referential ambiguity of the masculine can challenge readers, raising the question of whether this challenge is greater than decoding gender-inclusive forms, which are unambiguous as they never refer solely to men. The annotated dataset is published with this paper (see Supplementary Material, Section 5), allowing further analyses of these different forms of embedding by any interested researcher.

Conclusion

Research into the connection of gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. By now, there is a growing body of studies investigating linguistic dimensions of the category ‘gender’. Psycholinguistic scholars have made significant contributions, particularly in addressing the male bias associated with masculine generics. However, there exists a demand for more corpus-based studies that investigate real language usage as the debate on gender-inclusive language is mostly guided by unverified presuppositions. One major claim is that the use of gender-inclusive language makes texts too long, monotonous or difficult to read, and might even make it more difficult to learn German as a foreign language. These would be strong arguments against the use of gender-inclusive language, but they are not based on empirical evidence (cf. Pabst and Kollmayer 2023, Blake and Klimmt 2010, Friedrich and Heise 2019, and on German as a foreign language Peuschel 2022). Our data provides the first empirical quantitative basis of how much textual material would actually have to be changed if non-gender-inclusive German texts were rewritten to be gender-inclusive. We extracted three central values from our analysis: an average of (a) 9.45% of all tokens are (or are part of) person references; (b) 0.95% of all tokens would be affected by gender-inclusive language; (c) 9.99% of tokens with person reference would be affected by gender-inclusive language. In total, one third of all documents we analysed would remain unchanged. The small proportion in (b) calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics. Furthermore, not all tokens would have to be replaced by more than one other token if we wanted to reformulate the texts in a gender-inclusive way. Many lexemes in German can be neutralised (e.g. by replacing the masculine generic Lehrer (‘teacher’) with the neutralising Lehrkraft (‘teaching staff’). With this strategy, the length or complexity of the texts does not increase. In general, gender-inclusive language would almost exclusively concern nouns, for which there are already numerous strategies of implementing unobtrusive gender-inclusive variants that do not include the disputed gender symbols (e.g. pair forms and epicenes, cf. Steinhauer and Diewald 2017, 118, 132). A recent survey by the German public-broadcasting institution WDRFootnote 19 has shown that many of these variants are already widely accepted.

The low proportion of textual material affected by gender-inclusive language can also be approached from another perspective, namely by counting the amount of explicit gender-inclusive forms in press texts that generally use such language. In Germany, the newspaper tazFootnote 20 is a prime example. Although it has no internal guidelines on gender-inclusive language, it is considered a ‘pioneer’ (Ochs and Rüdiger under review) in its use and is the only daily newspaper to use new strategies such as gender symbols (Lehrer*innen, Lehrer:innen) in a significant way (cf. Waldendorf 2023). However, the proportion of these forms in the whole text is only 0.2% (Ochs and Rüdiger under review). For the first time, our data provide a quantitatively reliable explanation for this: the fact that this proportion is so low is most likely due to the fact that only a small amount of linguistic material is affected by changes to gender-inclusive forms. Our results therefore point in the direction suggested by sociolinguistic studies: The discussions about gender-inclusive language may (ostensibly) revolve around issues of comprehensibility, readability and learnability, but there is more at stake – gender debates, including discussions about gender-inclusive language, are part of a broader cultural struggle (Blömen and Wilde 2019, Banegas and López 2021b, Roth and Sauer 2022). This is not to say, however, the possible complexities of gender-inclusive language should not continue to be investigated empirically – on the contrary: As describing and comparing the complexity of linguistic items is a difficult endeavour, our data are mainly intended to provide future research with a quantitative baseline – e.g. to compare the values with proportions of other structures that are considered complex in German. Further research into the comprehensibility, readability and learnability of different gender-inclusive forms can take our insights into consideration.

Outlook

Finally, we would like to specify two possible follow-up studies. We see especially promising potential in combining our data with automatic extraction procedures for personal nouns (Sökefeld et al. 2023), e.g. by using our annotations as training data for the recognition of masculine specifics and generics. To further this approach, we are currently in the process of conducting analyses at the lexical level to determine whether certain types of personal nouns (e.g. passive role nouns such as neighbour or citizen, cf. Bühlmann 2002, 174) are more prone to being used as masculine generics. Additionally, as we are aware that text type and genre are crucial categories when it comes to the use of gender-inclusive language and personal nouns in general, we conduct synchronic and diachronic analyses of other text types, e.g. city and company websites, letters to shareholders, protocols of parliamentary debates, Christmas and New Year addresses of the German chancellors and presidents (cf. Müller-Spitzer et al. 2022, Müller-Spitzer and Ochs 2023, 2024). This will help us better understand the use of person references across text types, i.e. beyond the press texts presented here. Furthermore, as corpus-based research into person reference is so far mostly limited to German and English, the extension of our approach to more languages would certainly be a fruitful addition to gender and language research.

Additionally, our annotated dataset could serve as a starting point to assess the difference (e.g., in terms of language processing or comprehensibility) between non-gender-inclusive and gender-inclusive texts. One possible approach to investigate this question is to use large language models (LLMs).Footnote 21 In this context, we could employ our annotated texts and prepare comparison versions rewritten to be gender-inclusive. We could then use a pretrained LLM to compute how difficult it is for the LLM to predict or process each text and its corresponding rewritten version. Comparing the difficulty of original and rewritten texts would then provide a quantitative measure of comprehensibility. This approach could be enriched by using several different forms of gender-inclusive language (comparable to Rosola et al. 2023), assessing the varying comprehensibility of these alternative forms with LLMs.