Introduction

This study investigates the colour-term (CT) vocabulary of young Russian speakers using an elicitation (list) task. We focus on dynamic processes in the Russian colour lexicon, specifically, neologisms and emerging linguistic models of colour naming that have developed over the past 30 years in response to the dramatic socio-economic changes of the post-Soviet era. A further aim is to examine diatopic variation in the CT inventory across two cities of European Russia: Kazan and Smolensk. Despite Russia’s geographical vastness, Russian is relatively homogeneous, particularly among younger speakers. Dialectal vocabulary differences are minor and typically limited to vernacular or pronunciation features, which do not hinder mutual intelligibility (Vendina 2016).

Universal colour-term inventory

Following Berlin and Kay’s (1969/1991) seminal work, it is widely accepted that there are universal, pan-cultural basic colour categories (BCCs) that partition the perceived colour gamut into conceptual regions. These are named by basic colour terms (BCTs), a “core” vocabulary posited to reach a maximum of 11: ‘white’, ‘black’, ‘red’, ‘green’, ‘yellow’, and ‘blue’ (primary BCTs), ‘brown’, ‘grey’, ‘orange’, ‘pink’, ‘purple’ (secondary BCTs).

Since 1969, the universalist hypothesis has been revised in light of theoretical critique of the proponents of the relativist view, who argue that colour categories are language-specific and shaped by non-trivial cultural constraints (Saunders & van Brakel 1997). Empirical findings have partly supported the linguistic-convention account of colour lexicon (for reviews, see Davidoff 2015; Paramei 2020; Paramei and Bimler 2022).

Currently, broadly accepted is the weak relativist account (for a review, see Lindsey and Brown 2021). According to it, along with universal cognitive factors, the variety of the terms that allow lexical distinction of perceived colour nuances is influenced by salient natural features, i.e., the “colour diet” of the language speakers (Josserand et al. 2021), and by the man-made visual environment (Thompson et al. 2020), e.g., prominent colour shades of artefactual objects, colourant technology etc.

Socio-cultural and historical factors (cultural transmission bias) shape the semantic structure of colour categories also non-universally, driving the emergence of novel (basic) CCs/CTs due to the need for communicative efficiency (Gibson et al. 2017; Jameson and Komarova 2009; Zaslavsky et al. 2019, 2022). The cultural transmission bias explains why the tenet of the upper limit of 11 BCTs has recently been reconsidered. Based on their data for Russian, with salient sinij ‘dark blue’ and goluboj ‘light blue’, Berlin and Kay (1969/1991) admitted the possibility of a 12th BCT in a language. Categorical partition of the BLUE area, i.e., two BCTs that differentiate light and dark(er) shades of blue, has since been confirmed for Russian, as well as for a number of Slavic, circum-Mediterranean and Far East languages (for a review, see Paramei and Bimler 2022).

Forestalling the following part, on Russian (basic) CT inventory, we conclude this subsection by recalling Berlin and Kay’s (1969/1991, p. 6) criteria for basicness of a colour term: it must (i) be monolexemic, (ii) not be subsumed under another BCT, (iii) apply not only to a limited class of objects, and (iv) be psychologically salient for all informants, i.e., appearing early in elicited lists, used frequently across contexts, and occur in all speakers’ idiolects.

Russian colour terms

In modern Russian, (psycho)linguistic studies consistently identify counterparts of ten Berlin–Kay BCTs: belyj ‘white’, čёrnyj ‘black’, krasnyj ‘red’, zelёnyj ‘green’, žёltyj ‘yellow’ (primary BCTs), and koričnevyj ‘brown’, oranževyj ‘orange’, fioletovyj ‘purple’, rosovyj ‘pink’, and seryj ‘grey’ (secondary BCTs) (Corbett and Davies 1995, 1997; Corbett and Morgan 1988; Davies and Corbett 1994; Frumkina 1984; Kul’pina 2001, 2007, 2019; Paramei et al. 2018; Vasilevich et al. 2016). Two distinct BCTs for ‘blue’ – sinij ‘dark blue’ and goluboj ‘light blue’ – make Russian an exceptional case of a 12-term inventory (Corbett and Davies 1995, 1997; Griber et al. 2018; Martinovic et al. 2020; Paramei 2005, 2007; Paramei et al. 2018). In its frequency, sinij is comparable to primary, while goluboj to secondary BCTs (Laws et al. 1995; Vamling 1986).

Several recurring findings characterise Russian CTs:

  • PURPLE: Russian fioletovyj (lit. ‘violet’) most closely meets the BCT criteria for ‘purple’ but covers a limited denotative range (Paramei et al. 2018). The PURPLE area is lexically elaborated with highly frequent non-BCTs: in the bluish sub-area, fioletovyj is “competed” by sirenevyj ‘lilac’ and lilovyj ‘mauve’; in the reddish part, it is complemented by malinovyj ‘raspberry’, bordovyj ‘claret’, purpurnyj ‘cardinal red’, and višnëvyj ‘cherry-coloured’, as well as older terms like bagrovyj ‘crimson’ and bagrânyj ‘purplish-red’ (Frumkina 1984; Safuanova and Korzh 2007; Vasilevich 1983, 1988). Recent decades have seen enrichment through English loanwords such as fuksiâ ‘fuchsia’ and madženta ‘magenta’ (Griber et al. 2018, 2021; Paramei et al. 2018; Vasilevich et al. 2016).

  • BROWN: The BCT koričnevyj, introduced in the 17th century, has a high-frequency ranking although lower than that of other BCTs (Corbett and Davies 1995; Davies and Corbett 1994; Morgan and Corbett 1989). Its usage is constrained predominantly to combinations with nouns for artefacts, while its older counterpart buryj (‘dust brown’) denotes natural objects (Bochkarev et al. 2023; Krylosova and Tomachpolski 2013; Rakhilina and Paramei 2011). Analysis of the Russian sub-corpus of Google Books Ngram shows that koričnevyj steadily increases the range of denoted objects, gradually expanding into the realm of natural objects and supplanting buryj (Bochkarev et al. 2023).

  • Richness and morphology: Russian exhibits great lexical richness in colour naming, including numerous hyponyms (e.g., vasil’kovyj ‘cornflower’, salatovyj ‘lettuce-coloured’). Furthermore, as “a highly inflectional language with a rich morphological system” (Masini and Benigni 2012, p. 447), Russian allows nuanced colour naming through a variety of suffixes expressing attenuative, approximate and/or evaluative meanings such as -ovat- (translated by English -ish) or -en’k- (affective diminutive). Abundant are polylexemic names – compounds (e.g., žёlto-zelёnyj ‘yellow-green’), terms with achromatic modifiers such as ‘light’, ‘bright’, ‘dark’, ‘dull’, ‘pale’ or ‘tender’, as well as CTs reflecting expressive complexity (e.g., glubokij mšisto-zelënyj ‘deep lichen green’). Terms of the pattern cveta Х ‘colour of X’ are also common (e.g., cvet svežej travy ‘colour of fresh grass) (Kharchenko 2023; Kravcova 2017; Kul’pina 2001, 2019; Rakhilina 2008; Vasilevich et al. 2016).

  • Neologism boom: In the post-Soviet era (since the 1990s), Russian has seen a “neologism boom”, the expression coined by Krylosova (2013) to capture an explosive growth of lexical innovations in Russian colour vocabulary. Its manifestation is an ongoing emergence of novel CTs and an increased use of previously infrequent terms, e.g., lavandovyj ‘lavender’ and metalličeskij ‘metallic’ (Griber et al. 2018, 2021; Krylosova 2013; Paramei et al. 2018). Another observed tendency, especially in youth lexicon, is the truncation of canonical adjectival CTs derived from objects into noun stems (e.g., lajmovyjlajm ‘lime’, baklažanovyjbaklažan ‘aubergine’, etc.), the metonymic pattern typical of English. These innovations are linked to a surge of international trade contacts accompanied by a dramatic transformation of consumer markets with a substantial influx of Western products, whose advertising often includes Russian transliterations of English, French or Italian colour names (Griber and Mylonas 2015; Kravcova 2017; Krylosova 2013; Krylosova and Tomachpolski 2014, 2017; Marinova 2019, 2023a, b; Vasilevich 2003; Vasilevich et al. 2016).

Elicitation task

We applied the elicitation technique, or method of free-listing developed by Weller and Romney (1988), in which participants list as many domain-related terms as possible within a short time (typically 5 min) by either writing the terms down or giving oral responses. This technique yields rich, domain-relevant data and is powerful for ascertaining the salience of frequently mentioned items (Borgatti 1999; Quinlan 2019). The method was shown to be convenient in the colour domain in several languages (Alldrick 2025; Borgatti 1998; Corbett and Morgan 1988; Del Viva et al. 2023; Jakovljev and Zdravković 2018; Uusküla 2007; Xu et al. 2023, to name just a few).

Across the obtained lists, one estimates the number of times each CT is elicited and the decreasing order of the terms’ frequency (F): a few CTs are mentioned by many respondents, but many other terms are listed by only a few. The frequency (and its rank) reliably distinguishes BCTs from non-BCTs and can differentiate between primary and secondary BCTs (Corbett and Davies 1997; Davies and Corbett 1995). The elicited CTs, sorted according to their frequency (from largest to smallest), are a function that reflects the term’s salience gradient and follows a roughly exponential decline. It often shows an “elbow”, a natural break that allows identification of the most salient CTs as having basic status. These are separated from non-BCTs – hyponyms, complex and idiosyncratic terms, with stretches of slower decline and less punctuated drops (Bimler and Uusküla, 2018; Del Viva et al. 2023; Jakovljev and Zdravkoviċ 2018; Kuriki et al. 2017; Lindsey and Brown 2014).

The second measure is the term’s occurrence (position) on the list: the closer the CT is to the top of the list, the more salient it is. Across the lists, the term is indexed by its mean position (mP) and the corresponding rank. Sutrop (2001) introduced the Cognitive Salience Index (S), which combines both the term’s F and mP, allowing for a clear-cut break between BCTs and non-BCTs. The S-index is broadly used to operationalise Berlin–Kay’s criterion (iv).

Kazan and Smolensk: dialectal background

Kazan, the capital of the Republic of Tatarstan, in the southwest of Russia (55°47′ N 49°6′ E), lies 700 km east of Moscow on the Volga River (Fig. 1). It is an important cultural centre with a population of 1.31 million (Čislennost’ 2023). Historically part of Volga Bulgaria, the Golden Horde, and the Kazan Khanate, Kazan inherited Islamic-Turkic traditions alongside Finno-Ugric, Bulgarian, and Russian influences. In modern Tatarstan, with two official languages, Russian is influenced by Tatar (Alldrick 2025). Middle Russian dialects of the Volga-Kama region are shaped by contact with Finno-Ugric and Turkic languages of neighbouring republics Mari El, Udmurtia, Chuvashia and Bashkortostan (Heard 2011; Vendina 2014).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Map of European Russia; highlighted are the locations of Kazan and Smolensk.

Smolensk (54°45′ N 32°4′ E), 360 km west-southwest of Moscow on the Dnieper River (Fig. 1), has 312,896 inhabitants (Čislennost’ 2023); it is an administrative, industrial and cultural centre, one of the oldest Russian cities. As a Belarusian-Russian border city, Smolensk culture reflects strong Belarusian influence. Its dialect belongs to the western group of Southern Russian dialects, closely linked to Belarusian.

The natural environment in Kazan and Smolensk is not very dissimilar: Both cities share a continental climate with four distinct seasons (average +17–22 °C in summer, –9 to –12 °C in winter) and both lie at cultural crossroads. Yet, their historical trajectories have fostered distinct linguistic environments.

Aims of the study

The present study pursues three aims:

  1. 1.

    To assess the Russian CT lexicon in young native speakers through elicitation data, estimating the BCT inventory on the basis of frequency and cognitive salience.

  2. 2.

    To explore the salience of high-frequency non-BCTs that may be emerging as culturally basic terms.

  3. 3.

    To evaluate diatopic variation in CT inventories between Kazan and Smolensk, where regional lexical differences have been noted in semantic domains other than colour (Kazan: Safonova et al. 2014; Smolensk: Lunkova 2020).

This study contributes novelty by (i) quantifying cognitive salience of Russian BCTs (N = 12) and about 20 high-frequency non-BCTs; (ii) documenting the post-Soviet “neologism boom” in the colour domain; (iii) analysing polylexemic expressions that enhance communicative efficiency; (iv) identifying colour space areas of ongoing lexical refinement in Russian and aligning them with cross-linguistic trends.

Methods

Participants

Respondents from two cities, Kazan and Smolensk, were comparable in age, education, and socio-economic background. The Kazan sample comprised students of Kazan Federal University (NK = 112; 79 females; aged 18–20 years). The Smolensk sample comprised students of the Smolensk State University (NS = 143; 103 females; aged 16–32 years). All participants were native Russian speakers; some were simultaneous bilinguals who acquired and used two languages from birth (e.g., Tatar in Kazan or Belarusian in Smolensk).

All participants reported normal or corrected-to-normal visual acuity. They were recruited as volunteers and were unaware of the study’s purpose. The study followed the guidelines of the Declaration of Helsinki and was approved by the respective local ethics committees. Written informed consent was obtained from all participants prior to testing.

Procedure

The elicitation task followed Davies and Corbett (1994). In Kazan, it was administered in person at the university; in Smolensk, either in person or online (via Zoom or Meet). Before data collection, participants received an Excel file (via Google Drive) containing the following instructions in Russian: “Please write as many colours as you know by typing them with your keyboard. You have 5 min for this task”. The experimenter started and stopped the clock after the 5-min interval. Participants entered their responses using the Cyrillic alphabet, and individual lists of CTs were recorded in the order they were produced.

Results

Individual CT lists (‘Raw data_Kazan’, ‘Raw data_Smolensk’) can be found in the Figshare repository: https://doi.org/10.6084/m9.figshare.27663297. The individual raw data were cleansed (see Supplementary Materials: ‘Details of data analysis → Data cleansing’) and then consolidated as group data (see ‘Preprocessed_data_Kazan’, ‘Preprocessed_data_Smolensk’ in https://doi.org/10.6084/m9.figshare.27663297). Analyses were conducted separately for the Kazan and Smolensk datasets (frequency estimates are available in ‘DATA_Kazan’, ‘DATA_Smolensk’ in https://doi.org/10.6084/m9.figshare.27663297). Cleansed lists of the Russian colour names were transliterated into Latin script using the free Online Transliterator (n/d). English glosses followed Frumkina and Mikhejev (1996).

Comparisons of the Kazan and Smolensk data samples focused on the following aspects:

  • the number, variety, and word composition of elicited CTs;

  • CT frequency and list position;

  • core colour inventory vs. less common CTs (Zipf-function);

  • Cognitive Salience Index (S; see above) of frequent CTs;

  • conceptual associations of frequent CTs, visualised by dendrograms and semantic maps.

The number and variety of elicited colour terms

We assessed the total number of unique colour names and the proportions of monolexemic vs. polylexemic CTs. For polylexemic CTs, we further analysed the distribution of part-of-speech (POS) compositions using the pymorphy2 PoS-tagging algorithm.

Kazan participants produced 2775 items (362 unique CTs), while Smolensk participants produced 3918 items (423 unique CTs). Individual lists ranged from 9 to 46 items in Kazan (mean 24.1) and 16 to 42 in Smolensk (mean 27.0). (Full frequency-ordered lists are in Tables S1 and S2 of Supplementary Materials.)

The large variety of CTs reflects both sensitivity to colour shades and the need for their nuanced lexicalisation. In addition to the 12 Russian BCTs, both samples include numerous derived forms with achromatic modifiers (e.g., tëmno-seryj ‘dark grey’), substantive qualifiers (e.g., nebesno-goluboj ‘sky light blue’, lajmovo-zelënyj ‘lime green’), and compounds (e.g., rozovo-koričnevyj ‘pink-brown’). We also observe emotionally-laden and/or aestheticising (evaluative) qualifiers (e.g., nežno-rozovyj ‘tender pink’, boleznenno-žëltyj ‘painfully yellow’), and (polylexemic) descriptions with affective diminutive suffixes (-ovat-, -en’k-) (e.g., želtovato-kremovyj ‘yellowish-creamy’, sinen’kij ‘endearing sinij’).

Derivational productivity differs across BCTs. In Kazan, the richest sets of polylexemic descriptors are for krasnyj ‘red’ (25), rozovyj ‘pink’ (20), sinij ‘dark blue’ (19), and zelënyj ‘green’ (19). In Smolensk, the most productive are sinij ‘dark blue’ (25), žëltyj ‘yellow’ (24), and zelënyj ‘green’ (22).

Both datasets contain frequent non-BCTs (with %F in Kazan and Smolensk, respectively): beževyj ‘beige’ (71, 69), birûzovyj ‘turquoise’ (61, 59), salatovyj ‘lettuce-coloured’ (52, 52), bordovyj ‘claret’ (51, 54), malinovyj ‘raspberry’ (43, 52), sirenevyj ‘lilac’ (32, 39). These also appear in modified, compounded or suffixed forms (e.g., svetlo-birûzovyj ‘light turquoise’, malinovo-seryj ‘raspberry-grey’, sirenevatyj ‘lilac-ish’). Many non-BCTs are motivated by food and beverages, (semi-)precious stones, objects of fauna and flora, metals, artefacts, and natural phenomena.

Composition of colour descriptors

As shown in Fig. 2 and Table S3, types of colour descriptors are very similar in both lists: most CTs are adjectival derivatives of object glosses (Kazan 70.9%, Smolensk 67.7%). Smolensk respondents more often produced non-canonical noun CTs (e.g., oxra ‘ochre’, mâta ‘mint’) than Kazan respondents (13.3% vs. 5.4%). Proportion of not-yet-conventionalised forms of cvet X “colour of X” or ottenok X “tint/shade of X” was low: 6.5% in Kazan (16 items) and 2% in Smolensk (six items).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Proportion of (polylexemic) colour names with different part-of-speech compositions in Kazan (left) and Smolensk (right).

Frequency of colour terms

For each CT, we calculated its frequency (F) and derived corresponding rank values (RF). Since the two samples differed in size, we report predominantly relative frequency (%F = 100 × F/N), the percentage of participants’ lists containing the term. Tables S4 and S5 list frequent CTs produced by ≥5 participants (Kazan: NK-F = 86; Smolensk: NS-F = 110), with frequency (F, %F), alongside respective rank (RF).

Figure 3 plots relative frequencies (%F) of the CTs with the highest frequency (for brevity, truncated to NF = 40 of each dataset). In both samples, 77% of responses comprise the most frequent CTs. The 12 BCTs dominate the top ranks, though in different order, with ‘black’ championing in Kazan and ‘red’ in Smolensk. The five most frequent non-BCTs (RF = 13–17) are identical across samples: ‘beige’, ‘turquoise’, ‘lettuce-coloured’, ‘claret’, ‘raspberry’. For 75 CTs present in both Kazan and Smolensk lists (Tables S4 and S5), frequency ranks (RF) correlate strongly between the cities (ρRF = 0.906, p < 0.001), indicating a largely stabilised colour lexicon.

Fig. 3: Relative frequency (%F) of Russian colour terms CTs, ranked 1–40, for the Kazan (top) and Smolensk (bottom) samples.
Fig. 3: Relative frequency (%F) of Russian colour terms CTs, ranked 1–40, for the Kazan (top) and Smolensk (bottom) samples.The alternative text for this image may have been generated using AI.
Full size image

Bars are colour-coded by gloss.

Both %F functions decline exponentially. Guided by Borgatti (1998), we searched for natural breaks in the distribution. In both %F functions, we discern a distinct break between koričnevyj ‘brown’ (RF = 11.5) and beževyj ‘beige’ (RF = 13), which manifests the hurdle between BCTs and non-BCTs. In Kazan, one more break occurs between malinovyj ‘raspberry’ (RK-F = 17) and purpurnyj ‘cardinal red’ (RK-F = 18), while in Smolensk between birûzovyj ‘turquoise’ (RS-F = 15) and salatovyj ‘lettuce-coloured’ (RS-F = 16).

The Zipf-function

We estimated a lognormal function reflecting the term’s “popularity”, the function intrinsically connected to a power-law relation (Mitzenmacher 2004) and called here the Zipf-function to align with the terminology in previous pertinent works. Following Lindsey and Brown (2014), we regressed the log-transformed proportion of participants listing a term (log%F) against the log-transformed frequency rank (RF). On a logarithmic scale, Zipf-functions typically yield a near-linear decline. We fitted the Zipf-function limbs with piecewise linear approximation using Muggeo’s (2003) algorithm (Python implementation), and estimated 95% confidence intervals (CIs) for slope coefficients using the Wald test.

In large language corpora, Zipf-functions typically reveal two segments, differing in the exponents and separating a kernel lexicon from a less popular lexicon used for specific communication (Cancho and Solé 2003). In the present context, two Zipf-function segments would be expected to reflect a division between BCTs and non-BCTs. However, recent studies of colour lexicons in other languages (Brown et al. 2016; Del Viva et al. 2023; Jakovljev and Zdravković 2018; Lindsey and Brown 2014) show that a three-segment Zipf-function fits better, differentiating frequent non-BCTs – potentially emergent BCTs (second segment) – from rarer tertiary CTs (third segment) (Lindsey and Brown 2021).

Applying Muggeo’s (2003) algorithm to high-frequency CTs (NK-F = 86, NS-F = 110), we identified three segments in both datasets (R2 > 0.98), and estimated break-points and segment slopes (Fig. 4). Limb 1 comprises the 12 BCTs (in Tables S4S5 shaded dark grey) listed by a great majority of the participants (Kazan: %F ≥ 79.5; Smolensk: %F ≥ 82.5). Limb 1 slopes are close to zero: Kazan (−0.07; 95% CI [(-0.09), (-0.05)]); Smolensk (–0.06; 95% CI [(-0.08), (-0.04)]).

Fig. 4: Zipf-functions, or diagrams of the Russian colour-term “popularity”.
Fig. 4: Zipf-functions, or diagrams of the Russian colour-term “popularity”.The alternative text for this image may have been generated using AI.
Full size image

Limb slopes fitted with trilinear models; data points of limbs 1 and 2 are colour-coded by gloss. Kazan (left): Limb 1, 12 BCTs (RK-F = 1–12). Limb 2, seven highly popular non-BCTs (RK-F = 13–19). Limb 3, low-popularity non-BCTs (RK-F = 20–81.5; unfilled circles). Smolensk (right): Limb 1, 12 BCTs (RS-F = 1–12). Limb 2, 13 highly popular non-BCTs (RS-F = 13–24.5). Limb 3, low-popularity non-BCTs (RS-F = 26–102.5; unfilled circles).

For the CTs with RF ≥ 12, Muggeo’s (2003) algorithm marked the second negatively sloping segment corresponding to highly frequent non-BCTs (in Tables S4 and S5 shaded light grey). For Kazan, limb 2 has a slope of (-1.95; 95% CI [(-2.35), (-1.55)]) and includes seven popular non-BCTs (70.5% ≥ %F ≥ 32.1%): beževyj ‘beige’, birûzovyj ‘turquoise’, salatovyj ‘lettuce-coloured’, bordovyj ‘claret’, malinovyj ‘raspberry’, purpurnyj ‘cardinal red’, and sirenevyj ‘lilac’. For Smolensk, limb 2 has a slope of (−1.21; 95% CI [(-1.32), (-1.10)]) and comprises 13 popular non-BCTs (69.2% ≥ %F ≥ 34.3%), including, in addition, xaki ‘khaki’, fuksiâ ‘fuchsia’, alyj ‘scarlet’, persikovyj ‘peach’, and two terms for metallic brilliance, zolotoj ‘gold’ and serebrânyj ‘silver’.

Limb 3 includes all remaining non-BCTs, with a slope for Kazan (−1.39; 95% CI [(-1.44), (-1.34)]) and for Smolensk (−1.52; 95% CI [(-1.56), (-1.48)]). This segment includes non-common, tertiary CTs: Kazan, 30.4% ≥ %F ≥ 4.5% (ranked 20.5–81.5); Smolensk, 28.7% ≥ %F ≥ 3.5% (ranked 26–102.5). The Mann–Whitney U test showed that limb 1 slopes do not differ (p = 0.08), confirming the equivalence of the BCT lexicon in the two locations. In comparison, slopes of limb 2 and limb 3 differ between the two datasets (p = 0.01 and p < 0.001, respectively).

Mean position of colour terms

For each CT, we calculated its mean position (mP). A derived rank value (RmP) is the diagnostic measure that usually discerns primary and secondary BCTs. In Kazan, the six primary BCTs have the highest RmP (Table S4) in the order: krasnyj ‘red’, zelёnyj ‘green’, žёltyj ‘yellow’, sinij ‘dark blue’, čёrnyj ‘black’, belyj ‘white’. In Smolensk (Table S5), the first six RmP are taken by krasnyj ‘red’, sinij ‘dark blue’, žёltyj ‘yellow’, zelёnyj ‘green’, čёrnyj ‘black’, goluboj ‘light blue’, while RmP = 8 of belyj ‘white’ falls within the rank range of secondary BCTs.

Overall, RmP correlation between cities was moderate (ρ RmP = 0.561, p < .001). However, as prompted by Fig. 5, RmP of the 13 lowest-ranking CTs – the 12 BCTs plus birûzovyj ‘turquoise’ – correlate highly (ρ RmP = 0.945, p < 0.001). RmP correlation of further eight frequent non-BCTs is also high (ρ RmP = 0.873, p < 0.001): bordovyj ‘claret’, malinovyj ‘raspberry’, beževyj ‘beige’, sirenevyj ‘lilac’, salatovyj ‘lettuce-coloured’, purpurnyj ‘cardinal red’, lilovyj ‘mauve’, and izumrudnyj ‘emerald’. For rarer non-BCTs (RmP > 21), correlation was low. This is an expected outcome of listing behaviour, whereby highly salient CTs appear near the list start, limiting the differences among their ranks; conversely, less salient CTs can appear anywhere in a list, which allows larger ranking differences among them.

Fig. 5: Correlation of RmP for the most frequent colour terms (N = 75) across Kazan and Smolensk datasets.
Fig. 5: Correlation of RmP for the most frequent colour terms (N = 75) across Kazan and Smolensk datasets.The alternative text for this image may have been generated using AI.
Full size image

Colour-coded are the points for the terms with the highest RmP correlation (12 BCTs and nine frequent non-BCTs).

Cognitive Salience Index

To integrate CT’s frequency and mean position, we computed the Cognitive Salience Index (S) following Sutrop (2001):

$$S=F/(N\times \mathrm{mP})$$
(1)

where F is the number of participants listing the term, N the total number of participants, and mP the mean list position. The S-index is independent of list length; it ranges between 1 (term present in all lists) and 0 (term absent in any list), and yields a Cognitive Salience ranking (RS) in descending order. An “elbow” in the S-index function provides an additional criterion for discriminating between basic and non-basic CTs. The S-index allows cross-language comparison with previously published values.

Tables 1 and 2 present CTs with the highest S-index (S > 0.01) in Kazan (NK-HS = 36) and Smolensk (NS-HS = 33). As expected, the 12 BCTs dominated (shaded dark grey in Tables 1 and 2). As illustrated by Fig. 6, in both samples, krasnyj ‘red’ was the most salient (Kazan: S = 0.215, Smolensk: S = 0.267), followed by the other five primary BCTs. Note a drop-off before the second-salient BCT – žёltyj ‘yellow’ in Kazan (S = 0.124) and sinij ‘dark blue’ in Smolensk (S = 0.142). Across the CTs with the highest S-index, shared by Kazan and Smolensk respondents (NHS = 30), RS correlation was high (ρRS = 0.98, p < 0.001), indicating a consistent Russian core colour inventory.

Fig. 6: Russian colour terms with the highest Cognitive Salience Index (S > 0.01) for Kazan (NK-HS = 36) and Smolensk (NS-HS = 33).
Fig. 6: Russian colour terms with the highest Cognitive Salience Index (S > 0.01) for Kazan (NK-HS = 36) and Smolensk (NS-HS = 33).The alternative text for this image may have been generated using AI.
Full size image

Bars are colour-coded by gloss.

Table 1 List of Russian colour terms with the highest Cognitive Salience Index (S) in Kazan.importa.
Table 2 List of Russian colour terms with the highest Cognitive Salience Index (S) in Smolensk.

Lists of most salient non-BCTs (0.04 ≤ S ≤ 0.02) overlap extensively (shaded light grey in Tables 1 and 2). Smolensk respondents revealed slightly higher salience of persikovyj ‘peach’, a ‘fancy’ term, and fuksiâ ‘fuchsia’, a recent loanword. More salient in this sample is also xaki ‘khaki’, which we attribute to a perceptible uniform population in Smolensk that hosts a military academy.

Following Bimler and Uusküla (2021), we computed log-transformed estimates of normalised frequency Fi/N (where NK = 112, NS = 143) and mPi for each salient term i (named by at least five participants, as specified in Tables S4 and S5). Figure 7 presents scatterplots that depict CT’s mean position [−log10(mPi)] in the Kazan and Smolensk lists as a function of its normalised frequency [−log10(Fi/N)].

Fig. 7: Scatterplots of log-transformed normalised frequency [−log10(Fi/N)] and mean position [−log10(mPi)] for salient colour terms elicited in Kazan (NK-F = 86) (left) and Smolensk (NS-F = 110) (right).
Fig. 7: Scatterplots of log-transformed normalised frequency [−log10(Fi/N)] and mean position [−log10(mPi)] for salient colour terms elicited in Kazan (NK-F = 86) (left) and Smolensk (NS-F = 110) (right).The alternative text for this image may have been generated using AI.
Full size image

Points for the 12 BCTs and eight most frequent non-BCTs are colour-coded by gloss; unfilled circles indicate the remaining frequent non-BCTs listed in Tables S4 and S5.

In Fig. 7, Fi/N decreases from left to right, whereas mPi decreases from the top downward. At the top left, one finds the 12 BCTs, i.e., the terms that are most readily retrieved when participants are asked to “think of colour names”. From there, other terms align along a steep linear segment, where Fi/N decreases only slightly but mPi declines more rapidly. This trajectory subsequently reaches an “elbow,” which marks the transition to a second, less steep linear segment characterised by a sharper progressive decline in frequency. In both datasets, this “elbow” is located around birûzovyj ‘turquoise’: it borders the least basic koričnevyj, while abutting the (next) most frequent non-BCTs (salatovyj, bordovyj, beževyj and malinovyj). Further toward the right, the distribution of other non-BCTs diverges into a wider scatter.

Conceptual closeness of frequent colour terms

We assumed that in a semantic network, closely associated terms tend to prime each other and, thus, occur in the lists in close proximity, with one term often immediately preceding or following the other (Friendly 1977). Following Uusküla and Bimler (2016), we estimated an association measure of ‘adjacency’, ADJ(i,j), between the i-th and the j-th CTs. (For adjacency formulae, see Supplementary Materials: ‘Details of data analysis → Measure of colour-term adjacency’.) To reconstruct patterns of conceptual closeness among the elicited CTs, we analysed ADJ(i,j) matrices by applying hierarchical cluster analysis (HCA), with inter-term adjacencies as entries. For the analysis, we included CTs with S > 0.01, whose ADJ(i,j) could be estimated with confidence: Kazan: NK-HS = 36; Smolensk: NS-HS = 33 (adjacency matrices are available in https://doi.org/10.6084/m9.figshare.27663297).

Outcomes of HCA for Kazan and Smolensk are presented as dendrograms (Fig. 8). The dendrograms bring out CT clusters, the ‘chunking’ of the terms, that tended to emerge as self-contained sublists within the listing sequence, such that listing any one CT made it easier to include the rest of a ‘chunk’ before continuing with the listing sequence. Figure 8 reveals strong clustering of BCTs: at the highest agglomerative level, in both samples, these are conceptually distinct from frequent non-BCTs. Noteworthy is a robust link between sinij and goluboj within the BCT clusters. Notably, the placement of koričnevyj ‘brown’ differs between the samples: it is grouped with other BCTs in Smolensk but with beževyj in Kazan. The two samples also slightly differ in a distinction pattern within the BCT cluster: while Kazan speakers contrast chromatic and achromatic BCTs, Smolensk speakers discern BCTs, which denote vivid chromatic colours, and achromatic BCTs adjoined by BCTs denoting low-saturation chromatic colours, rozovyj and koričnevyj.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Dendrograms of colour terms with S > 0.01 in Kazan (top) and Smolensk (bottom), colour-coded by gloss.

At the intermediate agglomerative level, associational connections vary between clusters of non-BCTs: some comprise CTs of comparable salience, while other clusters manifest a gradient in the shared chromatic content or an achromatic distinction. The associational grounds slightly differ between the two samples. In the Kazan dendrogram, for instance, one can discern a cluster that comprises frequent hyponyms denoting bluish (‘lilac’, ‘lavender’) and reddish shades (‘cardinal red’, ‘fuchsia) of the PURPLE area; the rightmost cluster groups two entrenched phrasal units (‘colour of sea wave’, sero-buro-malinovyj). In comparison, in Smolensk, the rightmost cluster groups non-BCTs denoting colours of low chromaticity (‘beige’, ‘lilac’); in another cluster, ‘dark’-modified terms (tëmno-sinij, tëmno-zelënyj) coalesce.

Along with HCA, we processed the ADJ(i,j) matrices using multidimensional scaling (MDS) analysis. This enabled reconstructing semantic maps of CTs, an auxiliary way of representing their conceptual closeness. (For algorithms, see Supplementary Materials: ‘Details of data analysis → HCA and MDS analyses of conceptual closeness of colour terms’.) Semantic maps visualise CTs geometrically as points in a low-dimensional space, where inter-point distances reflect the corresponding inter-term adjacencies, while dimensions represent semantic attributes that govern the succession of recalled CTs.

Semantic maps for Kazan and Smolensk (Fig. S1) and their discussion, as well as details of MDS solutions and coordinates of CT-points, are presented in Supplementary Materials: ‘Semantic maps of the elicited colour terms’. Here, we shortly remark that, superimposed onto the semantic maps, the HCA clusters provide additional insight into the semantic features of conceptually close terms (van der Klis and Tellings 2022).

Discussion

General factors shaping colour-term inventories

The development of the colour-term vocabulary in a language is influenced by both universal and culture-specific factors. As surmised by Berlin and Kay (1969/1991), the fine-grainedness of a colour-naming system is associated with societal complexity and its level of industrialisation. As was demonstrated more recently by Majid et al. (2018), expressibility of discriminated colours (“codability”) is higher for larger populations, since a higher likelihood of interaction with strangers increases the need for more specific vocabulary. Thus, the emergence of fine-grained meanings corresponding to colours and shades of lightness is driven by the need for efficient communication, ensuring that colour descriptions can be made consistently and accurately (Conway et al. 2020; Gibson et al. 2017; Zaslavsky et al. 2019, 2022). Lexical refinement in the colour domain is stipulated both by “nature” – variation in reflectance encountered in the visual environment (Baddeley and Attewell 2009; Komarova and Jameson 2008) – and by “culture”, the degree of interest in colour (Kemp et al. 2018; Regier et al. 2016) and exposure to variation of colour in artefacts (Josserand et al. 2021).

Richness of Russian colour vocabulary

Our data reveal an extensive lexical variety in the modern Russian colour vocabulary (Tables 1, 2, and S1S5): alongside BCTs and hyponyms, young speakers use numerous polylexemic terms and phrasal units (Fig. 2). The total number of unique colour names, 362 in Kazan and 423 in Smolensk, far exceeds earlier findings for Moscow (126 terms; Davies and Corbett 1994) or St. Petersburg (146 terms; Uusküla and Bimler 2016). This suggests that the Russian colour lexicon has expanded, partly through neologisms and loanwords introduced in recent decades.

Individual list lengths reinforce this trend, with, on average, 24.1 items in Kazan (range 9–46) and 27.0 in Smolensk (range 16–42), compared to about 20 terms in earlier studies. The discrepancy may partly be explained by differences in the procedure of data collection: in earlier studies, CTs were written down by respondents or elicited orally and written down by the experimenter, whereas our participants typed responses directly. Demographic factors – gender and age – may also matter. Specifically, in earlier Russian studies, there was a greater proportion of male participants, whereby men tend to list fewer CTs than women (Krimer-Gaborović and Jakovljev 2022). Russian participant samples in earlier studies were somewhat older (between 18 and 65 years), so may have exhibited a reduced colour vocabulary compared to those in their 20s, as in our participant cohort (cf. Griber et al. 2021).

Core Russian colour inventory

Approximately 25 terms, very similar in both datasets, show the highest cognitive salience (Tables 1 and 2), forming a shared core Russian colour inventory. Unsurprisingly, the 12 BCTs dominate. Krasnyj ‘red’ consistently ranks highest, which attests to the persistence of its cognitive salience for Russian speakers across a considerable period of time (Bimler and Uusküla, 2014; Davies and Corbett 1994; Morgan and Corbett 1989). It is followed by zelёnyj ‘green’, žёltyj ‘yellow’, and sinij ‘dark blue’, denoting the canonical ‘cardinal hues’, along with the achromatic čёrnyj ‘black’ and belyj ‘white’. Secondary BCTs rank below these. Notably, in Kazan, černyj has the highest frequency (RK-F = 1) and higher salience (RK-S = 3) than in Smolensk (RS-F = 2.5 and RS-S = 5 respectively), plausibly reflecting the symbolic prominence of black in Kazan’s cultural and religious environment (Doherty 2017). The two “Russian blues” are both highly salient, with sinij (RK-S = 5, RS-S = 2) being within primary and goluboj (RK-S = 9, RS-S = 7) secondary BCTs. Their basic status confirms earlier (psycho)linguistic findings (e.g., Corbett and Morgan 1988; Paramei 2005, 2007).

Salience ranking of the Russian BCTs overall resembles that in many other languages (e.g., Corbett and Davies 1995; Krimer-Gaborović and Jakovljev 2022; Moreira et al. 2024; Uusküla and Sutrop 2007). The noteworthy exception is goluboj, although, as a 12th BCT, it has counterparts in other languages with two “basic blues”, such as Greek (Androulaki et al. 2006), Italian (Del Viva et al. 2023), Japanese (Kuriki et al. 2017), Turkish (Bimler and Uusküla 2014), and Spanish dialects (Lillo et al. 2018).

Non-BCTs with the highest salience included beževyj ‘beige’, birûzovyj ‘turquoise’, bordovyj ‘claret’, salatovyj ‘lettuce-coloured’, and malinovyj ‘raspberry’. These five terms were among the most frequent non-BCTs 30 years ago (Davies and Corbett 1994), as well as more recently (Bimler and Uusküla 2014), and likely are emerging BCTs.

Notably, this mirrors processes of developing most “popular” non-BCTs into BCTs documented in several other languages. Specifically, for ‘turquoise’, denoting an insertion category at the border of the BLUE and GREEN areas, there is accumulating evidence in German (Jones 2013; Zollinger 1984), American English (AE; Lindsey and Brown 2014), and Italian (Del Viva et al. 2023); Mylonas and MacDonald (2016) argue that turquoise has augmented basic colour inventory of British English (BE). ‘Beige’, the term originating from French (Morgan 1993), that emerged to lexicalise a hard-to-name area between YELLOW and BROWN, is, too, on steep salience rise in many languages (Boynton 1997; Boynton and Olson 1987; Eessalu and Uusküla 2013; Jakovljev and Zdravković 2018; Lindsey and Brown 2014). The other frequent non-BCT, salatovyj ‘lettuce-coloured’, ostensibly is cognate of popular AE lime partitioning light green area (Lindsey and Brown 2014).

In salience ranking, several other Russian non-BCTs are close successors of the five “popular” non-BCTs, such as sirenevyj ‘lilac’, purpurnyj ‘cardinal red’, lilovyj ‘mauve’, alyj ‘scarlet’, xaki ‘khaki’, zolotoj ‘golden’, persikovyj ‘peach’, izumrudnyj ‘emerald’ and the ‘fancy’ term fuksiâ ‘fuchsia’. These terms, widely used in Russian, were also among the most frequently listed non-BCTs in earlier studies (Davies and Corbett 1994; Uusküla and Bimler 2016).

We scrutinised productivity measures of sirenevyj (RK-F = 19, RS-F = 21), whose counterparts are highly salient in many languages (Bimler and Uusküla, 2014; Epicoco et al. 2024; Jraissati et al. 2012). In AE, lilac greatly overlaps with lavender, both emerging BCTs that partition the PURPLE area (Lindsey and Brown 2014); in BE, lilac is argued to have emerged as a BCT (Mylonas and MacDonald 2016). In Russian, by contrast, sirenevyj is declining in salience: 30 years ago, it was the most frequent term beyond the 12 BCTs (RF = 13; Davies and Corbett 1994; Morgan and Corbett 1989); 10 years ago in salience it ranked 17 (Bimler and Uusküla 2014); and in our study its rank is still lower (RK-S = 19, RS-S = 20). The decline of sirenevyj frequency may be caused by an increased use of denotatively close lilovyj (cf. Safuanova and Korzh 2007): 30 years ago, RS-lag of lilovyj behind sirenevyj was 9 (Davies and Corbett 1994), but 10 years ago it was just 2 (Bimler and Uusküla 2014). It seems that Russian speakers tend to denote light purple colours by lilovyj, in sirenevyj stead (Marjieh et al. 2024). In addition, sirenevyj might be supplanted by novel lavandovyj ‘lavender’, which was absent in Davies and Corbett’s list in the 1990s but is of relatively high salience here: RK- S = 35, RS-S = 41. (As illustrated by Fig. 8, in the Kazan dendrogram, sirenevyj and lavandovyj are closely associated.) This trend echoes semantic competition among closely related terms and suggests lexical restructuring within the PURPLE area.

In total, seven (Kazan) and 13 (Smolensk) non-BCTs constitute the second segment of the Zipf-functions (Fig. 4). Partial divergence in the composition and number of the popular non-BCTs, along with the slope difference, points to diatopic variation between Kazan and Smolensk, conceivably reflecting chromatic environment statistics in these locations.

The productivity measures of highly popular non-BCTs (up to 13) suggest that they form an essential part of Russian speakers’ core colour inventory, alongside the 12 BCTs. We attribute the great similarity of the core inventories of the two populations, residing over 2000 km apart, to educational centralisation in Russia with wider access to canonical culture, standardised education resources, and explicit instruction (cf. Majid et al. 2018).

Notably, the Russian core inventory is strikingly similar to Lindsey and Brown’s (2014) AE core lexicon of 20 named colour categories, comprising the 11 English BCTs plus several non-BCTs: teal (overlapping with turquoise), beige, lime, lavender (greatly overlapping with lilac), and maroon (close cognate of burgundy, counterpart of Russian bordovyj ‘claret).

Tertiary non-BCTs: recurring object references

Both participant samples offered numerous hyponyms (Tables S1 and S2). Tertiary non-BCTs mostly refer to specific objects (predominantly natural) integral to the life of Russians (cf. Griber et al. 2018). Frequent are CTs derived from (semi-)precious stones (e.g., rubinovyj ‘ruby’); berries, fruits and vegetables (e.g., višnëvyj ‘cherry’, morkovnyj ‘carrot’); flowers and trees (e.g., vasil’kovyj ‘cornflower’, kaštanovyj ‘chestnut’); food and beverages (gorčičnyj ‘mustard’, kofejnyj ‘coffee’); metals and alloys (e.g., bronzovyj ‘bronze’). These tertiary CTs exhibit regional variation shaped by “vernacular visuality” (Mitchell 2002) of salient referential objects (cf. Rubio-Fernandez 2021). (See Supplementary Materials: ‘Tertiary non-BCTs: regiolect variation’.)

The source-based derivation of Russian CTs parallels such models in other languages: items from similar semantic classes of objects (though culturally and ethnically specific) are widely used as referents in English (Casson 1994; Matschi 2004), German (Jones 2013), Polish (Kul’pina, 2001), Spanish (Lillo et al. 2018), Swedish (Bergh 2007), to name just a few languages.

We conclude this subsection with a remark on numerical outcomes in representation of tertiary non-BCT frequency by the third segment in the Zipf-function (Fig. 4). Our limb 3 slopes (Kazan: −1.39, Smolensk: −1.52) are closer to limb 3 slope for Italian (−1.21; Del Viva et al. 2023) but are not in the same ballpark as considerably steeper third-segment slopes for AE (−3.32; Lindsey and Brown, 2014) or Serbian (−2.96; Jakovljev and Zdravković 2018). Leaning upon Linders and Louwerse’s (2023) study on psychological mechanisms behind Zipf-function exponents, we conjecture that the cross-study discrepancy in the slope estimates is the consequence of the way participants were tasked. Specifically, in the AE and Serbian studies, participants were instructed to use solely monolexemic CTs, with no intrinsic modifiers; in comparison, the Russian and Italian studies allowed any CTs. The “monolexemic” instruction probably implied a greater degree of economisation of linguistic efforts, manifested by steeper slopes; in contrast, the “unconstrained” instruction was likely to encourage a more diversified, less economical language, which was demonstrated to yield shallower slopes in Zipf-functions (Linders and Louwerse 2023, pp. 81–82).

Colour names as phrasal units

Nuanced colour naming observed in both lists can be explained, to a certain degree, by the use of linguistic units of different levels (Marinova 2023a). As prompted by Table S3, polylexemic colour names partly are phrasal units based on metaphorical motivation, either entrenched or idiosyncratic, or new calques. The entrenched sero-buro-malinovyj (lit.) ‘grey-dust brown-raspberry’, a jocular vernacular expression used by Russian speakers to denote a drab, dull nondescript colour shade (Bochkarev et al. 2023), stands out as a highly salient phrasal unit.

We also observe instances of syntactic units of the form cvet Х ‘colour of X’, indicating that they are entering the language gradually (Rakhilina 2007). Some of these are entrenched in Russian, like “transparent” cvet morskoj volny ‘colour of sea wave’, a close synonym of birûzovyj ‘turquoise’ (Frumkina 1984), or playful vernacular cvet detskoj neožidannosti ‘colour of a child’s surprise’, paralleling the French caca du Dauphin (Kharchenko 2023; Kul’pina 2001, 2019; Rakhilina 2008; Vasilevich et al. 2016).

New developments in the Russian colour lexicon

Our lists contain plentiful novel CTs, as well as colour nominations in a non-canonical grammar form. This incites analysis of colouristic neologisms in relation to active transformations in Russian colour vocabulary in the 1990s and the following decades (Krylosova and Tomachpolski 2014, 2017; Marinova 2019, 2023a, b; Vasilevich et al. 2016).

As highlighted in the Introduction, the “neologism boom” in the colour domain is the consequence of the dramatic changes in post-Soviet Russia driven by market liberalisation with the influx of Western products and, with them, the import of novel CTs and new ways of colour nomination (Krylosova 2013). Via the advertising discourse, loanwords from English, French and Italian entered Russian and expanded the referent scope of CTs (Kravcova 2017; Krylosova and Tomachpolski 2014, 2017; Marinova 2019, 2023a, b; Nasibullina, 2010; Vasilevich et al. 2000, 2016). In many instances, CT transliterations were grammatically adapted as adjectival forms, with a suffix and ending canonical in Russian. Elicited from young Russians, active internet users and consumers, our lists testify to such novel terms, although their frequency is low, e.g., pudrovyj ‘powder-coloured’, metalličeskij ‘metallic’, amarantovyj ‘amaranth’, nudovyj ‘nude’, etc. In other cases, the terms were adopted as Russian translations, exemplified by the rising in frequency lososevyj ‘salmon’. Many of these terms are absent in the current explanatory dictionaries and are recorded only in special Russian language dictionaries or catalogues.

Our lists confirm another observation – of direct borrowings from English, French and Italian (Kravcova 2017; Krylosova 2013; Krylosova and Tomachpolski 2014, 2017; Marinova 2019, 2023a, b). Although in low frequencies, we record transliterations of foreign CTs functioning as object nouns, e.g., madženta ‘magenta’, tiffani ‘tiffany blue’, militari ‘military’, nûd ‘nude’, kapučino ‘cappuccino’, samo ‘salmon’, etc.

Across participant lists, we notice signs of grammar instability of some novel terms: e.g., along with fuksiâ, there are instances of cvet fuksii ‘colour of fuchsia’ (normative genitive construction in Russian) but also cvet fuksiâ ‘fuchsia colour’, marked by non-inflexion of the noun. The latter exemplifies a novel syntactic model that is spreading under the influence of advertising texts (Krylosova and Tomachpolski 2017; Marinova 2023a, b).

Notably, non-canonical nominal CTs, formed by the truncation of canonical adjectives, eventually became models for the grammar transformation of traditional Russian CTs (Krylosova and Tomachpolski 2014). In both lists we record nominal CTs, e.g., višnevyjvišnâ ‘cherry’, grafitnyjgrafit ‘graphite’, kobal’tovyjkobalt ‘cobalt’, šokoladnyjšokolad ‘chocolate’, etc.

Interestingly, almost all loanwords in our lists, whether transliterated or translated, have widely known Russian equivalents (Krylosova and Tomachpolski 2017): e.g., bordovyj for burgundi; cvet morskoj volny for akva; izumrudnyj for emerald; zaŝitnyj for militari; or telesnyj for nûd. However, a borrowed neologism generally does not occupy the entire “niche” of the respective Russian CT: the alleged equivalents develop denotative nuances apparent to a Russian speaker (Marinova 2019, 2023a, b).

More importantly is axiological aspect of novel coloratives (Vasilevich et al. 2016): loanwords, viewed as original and fresh, in product marketing are intended to enkindle connotations of “prestige” and “elegance” of the advertised product, in order to motivate favourable consumers’ decisions (Krylosova 2013; Krylosova and Tomachpolski 2014, 2017; Marinova 2019, 2023a, b; Skorinko et al. 2006; Vasilevich 2003).

In our lists, we see novel metaphoric phrasal units of French origin, which entered Russian from fashion catalogues and internet-fora, e.g., cvet bedra ispugannoj nimfy (lit.) ‘colour of a frightened nymph’s thigh’, loan translation of the fancy French term cuisse de nymphe effrayée. Present also are idiosyncratic expressions like cvet vlûblennoj lâguški ‘colour of a frog in love’ (Kazan) or cvet lâguški v obmoroke ‘colour of a fainted frog’ (Smolensk). As observed by Biggam (2012) for English, such ‘fancy’ object-derived loanwords frequently occur in the younger generation’s parlance, with colour references transparent for this generation’s speakers.

Finally, in both samples, noteworthy is a group of novel phrasal units that reflect the tendency towards analyticity characteristic of the contemporary Russian language (Marinova 2023a; Masini and Benigni 2012; Ohnheiser 2019) and that are inspired by marketing and advertising narratives. Such nominations are multiword “creative” CTs, nuanced (frequently evaluative) descriptions of chromatic properties (cf. Biggam 2012; Casson 1994) or metaphors alluding to speakers’ visual environment (Kravcova 2017).

As argued by Rubio-Fernandez (2021), such “redundant colour words” facilitate the speaker’s discriminability of the referent colour in the entire visual context, which, in turn, enhances referential communication. In our lists, “redundant colour words” are exemplified by adjective chains, like amarantovyj gluboko-purpurnyj ‘amaranth deep purple’ or marmeladno-goluboj ‘marmalade goluboj’, and [ADJ, N] compositions, e.g., cvet mokroj sireni ‘colour of wet lilac’, staroe zoloto ‘old gold’ (Kazan); nočnoê nebo ‘night sky’, morskaâ sol’ ‘sea salt’, ržavaâ med’ ‘rusty copper’ (Smolensk).

Patterns of conceptual association of Russian colour terms

The patterns of conceptual closeness of Russian CTs reveal the great extent of commonality between Kazan and Smolensk samples, implying that these language communities impose similar patterns of associations and interrelationships upon their core colour lexicon.

The governing conceptual theme is the cognitive salience, reflected at the highest agglomerative level: cluster analyses reveal a clear separation between BCTs and non-BCTs. Within a BCT cluster, there is a distinction between subclusters of the high- and low-chromatic BCTs, and achromatic BCTs. The semantic maps identified, in addition, a refined inter-term distinction that reflects the salience gradient of individual CTs, in accord with findings for other languages (Bimler and Uusküla 2014; Del Viva et al. 2023; Moreira et al. 2024). Non-BCT clusters reveal varying grounds of conceptual distinction: some group CTs by their semantic similarity and contiguity, others by the number of features (colour attributes) the two CTs share (cf. Uusküla and Bimler 2016), or by cultural associations and collocations that are likely to contribute to mutual CT priming (cf. Ronga et al. 2014).

The parallelism of cluster bases reflects possible trajectories through the elicited names, where an individual participant’s chain of associations might jump between, say, more- and less-basic chromatic terms, or between lighter and darker concepts. Some participants list primary chromatic BCTs, progressing to secondary chromatic BCTs and chromatic non-BCTs, while others, after the ‘cardinal hues’, list ‘white’ and ‘black’ and from there, follow the semantic links to ‘grey’, and still others pursue a systematic attempt to exhaust all variants of (say) ‘blue’ before moving on to (say) ‘green’.

Ongoing lexical refinement of colour space in Russian

An abundance of hyponyms and modified terms indicates that certain colour space areas are particularly prone to further lexical differentiation in Russian: BLUE, BLUE-GREEN, GREEN, PURPLE, and YELLOW-ORANGE-PINK. (For details, see Supplementary Materials: ‘Russian non-BCTs as indicators of ongoing colour space lexical refinement’.) The far majority of colour names in these areas are traditional and were attested in previous (psycho-)linguistic studies of Russian CTs (Bimler and Uusküla 2014; Davies and Corbett 1994; Frumkina 1984; Morgan and Corbett 1989; Safuanova and Korz 2007). Many of the recorded novel terms indicate attention to distinctiveness and allow a higher degree of lexical precision, creating new reference points within these colour space areas (Tribushinina 2008).

Similar processes of lexical refinement are attested cross-linguistically. Across languages of highest communicative efficiency, lexical refinement is observed in three colour clusters – cool (blue, green), warm (red, orange, yellow, brown), and intermediate (purple, pink) (Conway et al. 2020). The semantic alignment in colour lexicon enrichment across languages (Thompson et al. 2020) is credited to globalised cultural transmission through travel, media, trade and technological developments (Mylonas et al. 2022; Xu et al. 2013).

Limitations and future directions

Our study is limited by geography (restricted to European Russia), gender imbalance, and disciplinary heterogeneity among participants. Future work could (i) expand to other regions, including Asian Russia, (ii) balance gender samples, and (iii) systematically contrast Humanities vs. Science students. Finally, grammatical innovation in Russian CTs, especially the rise of non-canonical nominal forms and growing analyticity in colour vocabulary, merits dedicated investigation. Our raw data are available (Tables S1 and S2), and we invite interested linguists to further examine novel phenomena using our data.

Conclusions

Using elicited lists from Kazan and Smolensk, we mapped the Russian colour lexicon assessed by the terms’ frequency, cognitive salience, and conceptual associations. Our findings confirm 12 Russian BCTs (including two ‘blues’ sinij and goluboj), identify birûzovyj ‘turquoise’ as an emerging basic term, and show strong consensus around a shared core colour inventory of about 25 terms. Richness of hyponyms, polylexemic terms and phrasal units, and morphological productivity reflect Russian speakers’ awareness of colour nuances and the need for efficient communication of these, while the recent colouristic neologism boom highlights the dynamism of Russian colour naming.

The Russian case exemplifies how universal perceptual and cognitive pressures interact with culture-specific histories, producing both stability (BCTs, core colour inventory) and innovations (neologisms, phrasal units, adjective truncation). The ongoing lexical refinement of colour space in Russian aligns with cross-language trends, suggesting that globalisation fosters semantic convergence while leaving space for population-specific and local variation.