A layered model for glyph identity and transformation in scripts

Pardede, Raymond; Hosszú, Gábor; Kovács, Ferenc

doi:10.1038/s40494-026-02351-8

Download PDF

Article
Open access
Published: 06 February 2026

A layered model for glyph identity and transformation in scripts

Raymond Pardede¹,
Gábor Hosszú¹ &
Ferenc Kovács²

npj Heritage Science volume 14, Article number: 86 (2026) Cite this article

503 Accesses
Metrics details

Abstract

This paper presents a multilayer model for analyzing the identity and transformation of symbols within writing systems (hereafter referred to as scripts). The model comprises five interconnected layers: topology, visual identity, phonetic, semantic, and style. The topology layer defines the geometric and structural attributes of the glyphs of each symbol. The visual identity layer captures canonical features shared across glyph variants of a symbol. The phonetic layer links symbols to sound values, where applicable. The semantic layer situates symbols within their linguistic and cultural contexts. The style layer accounts for graphical variations introduced by instruments, scribal practices, and aesthetic conventions. Together, these layers constitute a general symbol model that can be applied across diverse scripts. As demonstrated through selected case studies, the model supports computational paleography, cross-script comparison, and the analysis of undeciphered inscriptions, advancing the formal modeling of script evolution and facilitating computational comparison and analysis of manuscripts.

Extreme performance of multi-layer laminated glass designs under blast loads

Article Open access 01 July 2025

Stylometric comparisons of human versus AI-generated creative writing

Article Open access 11 November 2025

Handwriting identification and verification using artificial intelligence-assisted textural features

Article Open access 08 December 2023

Introduction

Scripts (human writing systems) are central to human culture, serving as both a medium of communication and a repository of collective memory. They preserve linguistic expression, transmit knowledge across generations, and embody cultural identity^1,2, Like spoken languages, however, scripts evolve over time. Their development reflects shifts in language, intercultural contact, and advances in writing technologies^2,3, Such evolution manifests in the transformation of symbols, including alphabetic letters, logograms, numerical signs, and decorative motifs that together constitute a script.

Understanding these transformations is essential for both historical linguistics and computational analysis. Variations in symbol form can obscure underlying relationships, complicating the identification of graphemes or the classification of undeciphered inscriptions⁴. At the same time, systematic modelling of symbol change can reveal patterns of script evolution, clarify diachronic correspondences, and support the design of digital tools for computational paleography^5,6,7, Traditional descriptive approaches in epigraphy and paleography, while valuable, often lack the formalism required to capture gradual transformations or to compare symbols across different scripts^1,2,8,

While existing symbol models successfully address structural, visual, and linguistic dimensions, they often treat surface-level variation as incidental. The style layer is introduced to explicitly model systematic graphical variation arising from writing media, material constraints, cultural aesthetics, and scribal practices, which play a central role in paleographic analysis but are not adequately captured by existing approaches.

To address these challenges, this paper introduces a multilayer symbol model, which organizes the properties of symbols into five analytical layers: topology, visual identity, phonetic, semantic, and style⁹.

The topology layer represents the geometric structure of a symbol and the transformations that modify it.
The visual identity layer identifies canonical features shared by symbol variants, ensuring recognizability despite stylistic or structural differences.
The phonetic layer associates symbols with sound values where applicable, linking them to spoken language.
The semantic layer situates symbols within linguistic and cultural contexts, extending their meaning beyond form and sound.
The style layer captures graphical variation introduced by writing instruments, scribal practices, or aesthetic conventions.

By conceptualizing symbols through this layered framework, the model applies not only to graphemes in alphabetic and logographic systems but also to non-phonographic signs such as tamgas and decorative motifs⁹. This broader scope enables a more general treatment of symbol identity and transformation, making the approach relevant to linguistics, paleography, and interdisciplinary fields such as scriptinformatics.

The evolution of scripts is closely tied to broader cultural and technological change. Shifts in historical eras, such as the transition from the Bronze Age to the Iron Age, were accompanied by transformations in how people communicated, leading to changes in the graphic forms of symbols^1,2, Cultural contact also played a decisive role. When new objects or concepts were introduced through exploration or trade, scripts needed to adapt, either by inventing new symbols or by modifying existing ones². In a similar way, the development of new writing media and instruments, for example moving from carving stone to writing on parchment or paper, introduced new possibilities for expression and led to distinctive graphemic forms².

The relationship between spoken and written language has also been a driving force of script evolution. Rogers³ observed that borrowed phonographic systems often begin with a simple one-to-one mapping between sounds and symbols. Over time, however, spoken languages tend to change more quickly than scripts, creating misalignments between phonology and orthography. This relationship between speech and writing has been examined extensively in modern linguistic theory, where scripts are viewed as structured symbol systems mediating between phonetic and semantic domains¹⁰. Orthographic reforms are sometimes undertaken to restore this balance, while in other cases scripts evolve independently and diverge from the language they represent¹¹. When a script is borrowed into a new linguistic environment, further transformations may occur. New symbols may be added to represent unfamiliar sounds, or existing symbols may be dropped when they no longer correspond to phonological units³.

Technological factors also contributed to the transformation of graphemes. The same symbol carved into wood or stone typically takes on angular forms, while the use of pen and ink allows for smoother curves and greater stylistic flexibility². Such material constraints and innovations strongly influenced both the appearance of glyphs and the broader trajectory of script evolution.

In computational research, script recognition has long been studied but remains an inherently complex problem. Optical character recognition (OCR) methods have been developed that rely on predefined font styles or statistical text patterns⁴. While effective for standardized texts, these approaches struggle with short inscriptions, handwritten forms, or scripts that lack established fonts. Padma and Vijaya⁴, for example, emphasized the difficulty of script identification in multilingual documents, where even a single line of text may contain multiple languages. Fuzzy logic–based methods have been employed to address character ambiguity, with fuzzy rule visualization confirming the system’s robustness against blurred and fragmented characters⁵. In such cases, even a short stroke may be interpreted in multiple ways, which underscores the need for robust modeling strategies.

Other computational approaches, such as cellular wave computer algorithms, have also been proposed for handwritten text recognition⁶. These methods highlight the potential of algorithmic and mathematical techniques for handling non-standard or complex glyph forms, although they also face challenges when applied to ancient or undeciphered inscriptions.

Pattern systems, generalizations of scripts, comprise symbols, syntactic rules, and layout conventions, yet extend beyond human writing⁸. Pattern evolution, the study of their development, frames scripts and pattern systems as taxa with heritable features, combining data science, evolutionary modeling, and artificial intelligence. Phylogenetic tree reconstruction is applied to track morphological changes, accounting for homoplasy and computational complexity. Dedicated algorithms enhance the accuracy and efficiency of modeling script evolution¹², including phenetic¹³ and cladistic approach^14,15.

Taken together, these studies demonstrate the need for a systematic framework that accounts not only for the geometric variation of symbols but also for their functional dimensions. The multilayer symbol model proposed in this paper builds on these insights and offers a unified approach that integrates structural, visual, phonetic, and semantic attributes within a coherent framework⁹.

The remainder of this paper outlines the theoretical foundation of the model, its mathematical formulation, and its potential applications in the analysis of script evolution and the recognition of undeciphered inscriptions.

Prior computational work on script analysis and paleography has primarily focused on individual aspects of writing systems, such as stroke extraction, geometric shape similarity, visual-feature clustering, or optical character recognition. These approaches have proven effective for specific tasks, including classification and transcription, but they typically operate on a single representational level and do not address symbol identity as a multi-dimensional semiotic construct.

In contrast, the present study introduces a multilayer symbol model that integrates structural, visual, phonetic, semantic, and stylistic dimensions within a unified framework. While elements of these dimensions have been studied separately in previous work, no existing approach provides a formal model that combines them to analyze symbol transformation and identity across writing systems.

Methods

A symbol in a script may be represented by more than one glyph. Some glyphs can coexist as interchangeable variants, while in other cases, one glyph replaces another as scripts evolve over time. This process of replacement or modification is referred to here as symbol transformation. The goal of the model is to describe and formalize how such transformations occur and how different glyphs can be related to one another within a systematic framework.

The multilayer symbol model provides this framework by dividing symbol attributes into five distinct but interconnected layers: topology, visual identity, phonetic, semantic, and style. This model refines and expands upon the earlier Four-Layer Grapheme Model introduced by Pardede et al.¹⁶, incorporating an additional style layer to account for graphical and cultural variation. Each layer emphasizes a particular aspect of the symbol, ranging from its geometric foundation to its cultural and aesthetic context.

The evolution of scripts can usually be explained using the four layers of the multilayer symbol model: topology, visual identity, phonetics, and semantics. The fifth layer, the style layer, is unique in that it describes the properties of symbols’ glyphs that develop or change in line with fashion and customs. Scripts that evolve independently can be influenced by the same style. One example is the cursive style, which spread gradually across the Middle East in the first millennium AD, affecting a wide variety of scripts.

In this chapter, the multilayer symbol model is presented using a bottom-to-top approach. We begin with the topology layer, which defines the structural properties of the glyphs of the symbols and the operations that transform one glyph form into another. Building upward, we then introduce the visual identity layer, which captures canonical features shared across glyph variants of the same symbol, followed by the phonetic layer, which connects symbols to sound values. The semantic layer situates symbols within linguistic and cultural contexts, and finally, the style layer explains how writing instruments, materials, and conventions influence appearance. The structure and interrelation of the five layers are illustrated in Fig. 1.

By proceeding from geometric foundations to cultural and stylistic variation, the chapter demonstrates how each layer builds on the previous one. Together, these layers create a comprehensive model that explains how symbols can vary in form and meaning while maintaining their identity across time and context.

Topology layer

At the most fundamental level, a symbol can be described by the geometric structure of its glyph or glyphs. In the topology layer, a glyph is decomposed into elements, each corresponding to a continuous stroke (from pen-down to pen-up). Relationships among glyphs are then expressed in terms of the operations required to transform one glyph into another.

An example of glyph element identification is shown in Fig. 2. In this figure, the decomposition of a glyph into its constituent elements is illustrated, emphasizing the writing process as the basis for segmentation.

**Fig. 2: Decomposition of a glyph into its constituent structural elements.**

The topology layer uses a set of basic operators that capture how glyphs can be geometrically modified. These operators formalize actions such as extension, shortening, rotation, mirroring, shifting, insertion, or removal of elements. The complete list of operators is shown in Table 1. Each operator is denoted by a code (${O}_{i}$) and a descriptive name, which will be used consistently in the formulas and examples that follow.

Table 1 Basic operators for the topology layer

Full size table

Operators can be applied either to the entire glyph or to a single element of that glyph. In the case of whole-glyph application, the operator is applied to the glyph as a whole. In a whole-glyph application, the operator ${O}_{i}$ is applied to the entire glyph, modifying all of its elements simultaneously. In its general form, the transformation is defined as:

$${G}^{{\prime} }={O}_{i}\left(G,\,parameters\right)$$

(1)

where $G$ denotes the original glyph and ${G}^{{\prime} }$ represents the transformed glyph. As an example, a rotation operator rotates the entire glyph $G$ by an angle $\theta$.

$${G}^{{\prime} }={O}_{3}(G,\theta )$$

(2)

Fig. 3 illustrates such a case, where the whole glyph is rotated clockwise.

Operators can be applied either to the entire glyph or to a single element of that glyph. In a whole-glyph application, the operator acts on the glyph as a single unit. In an element-level application, the operator is applied independently to individual elements that compose the glyph. In general form, the transformation can be expressed as:

$${G}^{{\prime} }={O}_{i}(G\left({E}_{n}\right),\,parameters)$$

(3)

where ${E}_{n}$ denotes the n-th element of the glyph $G$, and the operator ${O}_{i}$ acts only on this element while leaving all other elements unchanged. As an illustrative example, a rotation operator may be applied to a single element. In this case, the second element ${E}_{2}$ is rotated by an angle of $60^\circ$, while the remaining elements remain fixed.

$${G}^{{\prime} }={O}_{3}(G\left({E}_{2}\right),\,60^\circ )$$

(4)

Figure 4 shows such a transformation, where only one element of the glyph is rotated.

This distinction is crucial, since many historical transformations affect only part of a glyph while leaving the rest unchanged. More complex transformations can be modeled as an operator sequence. Let

$$T=[{O}_{i1}{,\,O}_{i2}{,\ldots ,O}_{{im}}]$$

(5)

denote the transformation consisting of $m$ operators applied in order. Then the transformation from glyph ${G}_{1}$ to glyph ${G}_{2}$ is expressed as:

$${G}_{2}=T({G}_{1})$$

(6)

This means that the operator sequence $T$ acts on glyph ${G}_{1}$, producing ${G}_{2}$.

To illustrate the application of the proposed operators, a worked example is presented below. A transformation from a glyph ${G}_{1}$ to glyph ${G}_{2}$ is illustrated in Fig. 5. This figure shows the decomposition of the glyph into its elements, which allows us to identify components such as ${E}_{1}$, ${E}_{2}$, and ${E}_{3}$.

Based on that structure, the transformation can be expressed as the operator sequence:

$$T=[{O}_{1}\left(G\left({E}_{3}\right),\,40 \% \right),\,{O}_{5}\left(G\left({E}_{2}\right),\,{up}\right),\,{O}_{3}\left(G\left({E}_{2}\right),\,315^\circ \right)]$$

(7)

Applying this sequence to ${G}_{1}$:

$${G}_{2}=T({G}_{1})$$

(8)

The transformation proceeds in three sequential operations, as illustrated in Fig. 6. First, element ${E}_{3}$ is extended by 40 percent. Next, element ${E}_{2}$ is shifted vertically upward. Finally, element ${E}_{2}$ is rotated by 315 degrees.

Together, Figs. 5 and 6 illustrate how a glyph can be systematically transformed into another by applying a defined operator sequence.

The topology layer provides a rigorous description of the structural modifications that link glyph variants. However, structure alone does not fully explain why different glyphs are still recognized as representing the same symbol. To address this, the next layer of the model introduces the concept of visual identity, which captures the canonical features that remain stable across topological transformations.

Visual identity layer

The topology layer provides a rigorous description of structural modifications between glyph variants. However, structure alone does not explain why certain glyphs are still perceived as representing the same symbol. The visual identity layer addresses this issue by identifying the canonical features that remain stable across different forms. These features allow scribes, readers, and computational systems to recognize symbolic identity despite variation in execution.

For example, variants of the Latin letter “A” may differ in curvature, stroke thickness, or stylistic detail, yet they all share the canonical arrangement of an apex above and two supporting strokes below. It is this stable configuration that constitutes the symbol’s visual identity, regardless of superficial topological differences.

Canonical features represent the minimal set of visual characteristics that define a symbol and ensure its recognizability across glyph variants. Such features include geometric relations (e.g., parallelism, intersection, enclosure), characteristic shapes (e.g., loops, arcs, crosses), and spatial arrangements, such as a vertical stroke supporting a diagonal element.

Figure 7 illustrates this principle. The glyphs shown differ in their topological form, but all preserve a recognizable arrangement of canonical features. Despite distortion or stylistic modification, the core pattern is retained, enabling consistent recognition.

Two glyphs are considered variants of the same symbol in the visual identity layer if they share the same set of canonical features. Differences in topological detail, such as extended or shortened strokes, do not alter this identity as long as the essential pattern remains intact.

This can be formalized using a similar function:

$$S({G}_{1},\,{G}_{2})=\frac{{|F}\left({G}_{1}\right)\cap F\left({G}_{2}\right)|}{{|F}\left({G}_{1}\right)\cup F\left({G}_{2}\right)|}$$

(9)

Where $F(G)$ is the set of canonical features of a glyph $G$. The value of $S({G}_{1},\,{G}_{2})$ ranges from 0 (no features shared) to 1 (all features shared). Two glyphs belong to the same visual identity class if:

$$S({G}_{1},{G}_{2})\ge \tau$$

(10)

where $\tau$ is a predefined threshold.

As a worked example, consider two hypothetical glyphs. The first glyph, denoted ${G}_{1}$, is characterized by the feature set $F\left({G}_{1}\right)=\{{vertical\; stroke},{apex},{crossbar}\}$. The second glyph, denoted ${G}_{2}$, is characterized by the feature set $F\left({G}_{2}\right)=\{\mathrm{vertical}\mathrm{stroke},\mathrm{apex},\mathrm{diagonal}\mathrm{stroke}\}$. The sets share two features in common:

$$F\left({G}_{1}\right)\cap F\left({G}_{2}\right)=\{vertical\,stroke,apex\}$$

(11)

and together contain four distinct features:

$$F\left({G}_{1}\right)\cup F\left({G}_{2}\right)=\{vertical\,stroke,apex,crossbar,diagonal\,stroke\}$$

(12)

Thus, the similarity score is:

$$S({G}_{1},{G}_{2})=\frac{2}{4}=0.5$$

(13)

If we set the threshold at $\tau =0.5$, then ${G}_{1}$ and ${G}_{2}$ are considered variants of the same symbol. If a higher threshold, such as $\tau =0.7$ is applied, they would instead be classified as different symbols.

Figure 8 illustrates a representative example in which glyphs differ significantly in stroke execution, yet their sets of canonical features overlap sufficiently to satisfy the similarity criterion.

This demonstrates how the visual identity layer groups visually distinct forms under a single symbolic identity.

The threshold parameter τ is introduced to distinguish between glyphs that can be regarded as variants of the same symbol and those that represent distinct symbols. Conceptually, τ defines the maximum admissible deviation between glyph representations within a given layer, particularly in the topology and visual identity layers. Because writing systems differ substantially in their graphical complexity, calligraphic conventions, and degree of stylistic variation, τ cannot be defined as a universal constant. Instead, it must be adapted to the characteristics of the specific script under analysis.

Varying the value of τ directly affects the granularity of variant grouping. Lower threshold values result in stricter distinctions between glyphs, potentially separating stylistic variants into distinct groups, whereas higher values allow broader grouping at the risk of merging structurally distinct forms. This behavior reflects an inherent trade-off between sensitivity and generalization, which is common in classification and clustering tasks.

In the present theoretical framework, τ is treated as a heuristic parameter whose role is to support formal reasoning about symbol identity rather than to provide an optimized numerical solution. In future work, artificial neural networks (ANN) will be employed to simulate the multilayer model, allowing τ-like decision boundaries to be learned implicitly from data. Such ANN-based approaches will enable empirical evaluation of threshold sensitivity while preserving the conceptual structure introduced in this study.

Further illustrations of the visual identity layer are given in Figs. 9 and 10. Figure 9 demonstrates a glyph’s transformation over time. Despite structural changes across stages, canonical features persist, preserving visual identity.

Figure 10 presents the Common Identity template, listing the parameters used to define a symbol’s canonical features for consistent annotation and comparison.

Together, these examples highlight both the inclusiveness and the discriminating power of the visual identity layer: it recognizes variation when identity is preserved but also prevents conflating glyphs that do not share the necessary core features.

The visual identity layer has important implications for both theoretical and applied domains. It provides a criterion for grouping glyph variants that differ topologically but share the same symbolic identity. From a paleographic perspective, it explains how scribes and readers of historical texts were able to recognize symbols consistently despite variation caused by handwriting style, tools, or writing media. In computational contexts, it supports the development of algorithms that classify glyph images based on canonical features, thereby reducing the impact of superficial distortions and noise. By abstracting away from surface form, the visual identity layer also enables comparison of symbols across different scripts.

Through these applications, the visual identity layer bridges the gap between structural form and symbolic recognition.

By defining canonical features, the visual identity layer guarantees stable recognition of symbols across variants. This recognition forms the foundation for higher layers of the model, in which symbols are linked to phonetic values and semantic contexts. The next section, therefore, introduces the phonetic layer, which associates symbols with sounds.

Phonetic layer

The phonetic layer extends the model by associating visually defined symbols with their corresponding sound values. While the topology and visual identity layers ensure that different glyph variants can be recognized as the same symbol, the phonetic layer assigns this symbol a pronunciation or phonemic value within a given language.

This layer is essential for understanding the linguistic function of scripts: it explains how visual symbols participate in spoken language by serving as carriers of phonetic information.

Each grapheme class defined in the visual identity layer is mapped to one or more phonetic values. The mapping between graphemes and phonemes may vary depending on the script and historical context. In some cases, a one-to-one correspondence is observed, where a single grapheme consistently represents a single phoneme. In other cases, a one-to-many correspondence occurs, whereby a grapheme may represent different phonemes depending on positional or contextual factors. Conversely, a many-to-one correspondence may arise when multiple graphemes represent the same phoneme.

Formally, let $\complement$ be a canonical identity class (from the visual identity layer). The phonetic function $\Phi$ maps this class to a set of phonetic values:

$$\Phi \left(\complement \right)=\{{p}_{1},{p}_{2},\ldots ,{p}_{k}\}$$

(14)

where ${p}_{i}$ are phonetic units (e.g., phonemes or syllables).

Figure 11 provides an example of the phonetic layer in action. It shows how graphemes sharing the same visual identity are linked to a phonetic value in a given language. This demonstrates the process of moving from visual recognition to linguistic function.

In historical scripts, such as the Székely-Hungarian Rovash² or South Semitic¹⁷ scripts, a single canonical grapheme may represent multiple phonetic values depending on context. This flexibility highlights the need for a structured phonetic layer within the model.

The phonetic layer accounts for variability in symbol-to-sound correspondences across different scripts and historical contexts. Diachronic change may lead to shifts in grapheme-to-phoneme mappings over time as a result of sound change or orthographic reform. In multilingual or contact settings, the same grapheme may carry distinct phonetic values depending on language use. In the case of undeciphered scripts, hypotheses about phonetic values depend on recognizing grapheme identity classes and linking them to plausible sound correspondences.

By connecting grapheme identity classes to phonetic values, the phonetic layer forms the bridge between symbol recognition and language. The next section, therefore, introduces the semantic layer, which links symbols not only to sounds but also to meaning.

The phonetic layer is applied only where phonetic information is available or reconstructible; in cases of undeciphered or partially understood scripts, the remaining layers remain fully operative and informative.

Semantic layer

The semantic layer extends the model by linking symbols, already associated with phonetic values, to meaning. While the phonetic layer explains how graphemes represent sounds, the semantic layer explains how these symbols participate in the construction of words, morphemes, or semantic categories. This layer captures the communicative function of scripts by anchoring symbols in language and context.

Each grapheme class may correspond to one or more semantic values, depending on language structure, historical stage, and context. In some cases, a direct mapping is observed, where symbols represent whole morphemes or logograms, as in Chinese characters. In other cases, an indirect mapping occurs, in which symbols represent phonemes and acquire meaning only through their combination into words, as in alphabetic scripts. Additionally, polysemy may arise when the same grapheme carries multiple meanings depending on contextual factors.

Formally, let $\complement$ be a canonical identity class, and let ${\rm{{\rm M}}}$ denotes the set of possible meanings. The semantic function $S$ maps:

$$S\left(\complement \right)=\{{m}_{1},{m}_{2},\ldots ,{m}_{k}\}$$

(15)

where ${m}_{i}$ are semantic units, such as morphemes, lexemes, or categories.

In the Székely-Hungarian Rovash script, a grapheme identified and phonetically mapped in previous layers may contribute to different word forms. For instance, a grapheme corresponding to the phoneme /k/ may participate in words meaning “stone,” “hand,” or “king,” (in Hungarian: kő, kéz, and király, respectively) depending on lexical context.

In South Semitic scripts, graphemes similarly encode phonetic values that, when combined, form lexemes with identifiable meanings. The semantic layer thus situates symbols in the broader system of linguistic communication.

The semantic layer highlights the role of writing as a meaning-bearing system. It accounts for lexical encoding by showing how graphemes contribute to word construction, and it explains semantic disambiguation by describing how context resolves ambiguity when a grapheme carries multiple meanings. The semantic layer also addresses historical semantics by demonstrating how semantic values associated with symbols may shift over time. Furthermore, it enables cross-script comparison by supporting analysis of how different writing systems encode meaning, such as in logographic versus alphabetic systems.

The semantic layer ensures that symbols are not only recognized and pronounced but also understood. Yet, meaning can be expressed with stylistic variation, which reflects cultural, aesthetic, or scribal traditions. The next section, therefore, introduces the style layer, which extends the model by incorporating the stylistic dimension of writing.

Style layer

The style layer of the multilayer symbol model encompasses aesthetic, cultural, fashion-related, and scribal variations that influence the depiction of symbols within a certain script. Meanwhile, the topology, visual identity, phonetic, and semantic layers ensure structural recognition, symbolic identity, sound association, and meaning. By contrast, the style layer captures variations in the practical rendering of these symbols.

This layer acknowledges that writing is not only a vehicle for communication but also an artistic and cultural artifact. Symbols may be expressed differently depending on medium, tool, period, region, or the intention of the scribe. Some features of the style layer are important in that scripts that are not evolutionarily related to each other can develop in similar directions due to the prevailing fashion of a given era. Such was the mutual influence of Latin and Greek scripts in Europe during the Middle Ages.

The style layer is explicitly restricted to surface-level graphical variation that does not modify the topological structure of glyphs or the canonical features defining visual identity. Stylistic variation reflects systematic influences such as writing media, material constraints, cultural aesthetics, and scribal conventions. Factors are incorporated into this layer only when they produce recurrent and interpretable variation across glyph instances, rather than incidental or random deviations. In this way, the Style Layer complements the topological and visual identity layers without overlapping their core representational functions.

Stylistic variation can be analyzed along multiple dimensions. These include the medium and material of inscription, such as stone carving, ink on parchment, or digital rendering, as well as the influence of writing tools, where brushes, chisels, pens, or styluses produce distinctive stylistic effects. Stylistic variation is also shaped by cultural aesthetics, including calligraphic traditions, decorative embellishments, and symbolic ornamentation. In addition, scribal individuality contributes personal handwriting habits and idiosyncrasies, while period style reflects the evolution of stylistic norms across historical periods.

These dimensions may combine, resulting in layered stylistic effects within the same grapheme. Unlike the lower layers, style does not alter the canonical identity of a symbol, its phonetic value, or its semantic content. Instead, it operates as a surface-level transformation that influences perception, legibility, and cultural value.

Formally, let $\complement$ be a canonical identity class and $\mu \in S\left(\complement \right)$ a semantic unit. The style layer defines a stylistic transformation $\sigma$ such that:

$$\sigma (\complement ,\mu )\to {\complement }^{* }$$

(16)

where ${\complement }^{* }$ is a styled rendering of the canonical class, differing in appearance but identical in identity, phonetic value, and meaning.

Historical scripts demonstrate the importance of style in differentiating writing traditions. In medieval manuscripts, the same Latin grapheme may appear in Carolingian minuscule, Gothic Textura, or Humanist cursive, each of which is stylistically distinct while remaining semantically identical. In calligraphic traditions, such as Arabic or Chinese writing, style is elevated to an art form, with scripts distinguished by stroke modulation, rhythm, and ornamentation. In epigraphic contexts, stylistic choices often reflect material constraints, for example, the angular forms characteristic of writing carved into wood.

These examples show how style adds a cultural and aesthetic dimension to the symbol model without disrupting its structural or linguistic layers.

The style layer enriches the model in several ways. It enables systematic cultural analysis by supporting the comparison of stylistic schools, traditions, and historical periods. In the context of digital paleography, it supports the development of algorithms for classifying scribal hands and identifying stylistic trends in manuscript corpora. The style layer also contributes to the decipherment of unknown scripts by helping distinguish stylistic variation from genuine graphemic differences. Finally, it acknowledges the artistic and cultural dimensions of writing systems, emphasizing the preservation of textual heritage beyond purely functional considerations.

The addition of the style layer completes the multilayer symbol model and provides a comprehensive account of scripts as both communicative and cultural phenomena. Beyond style, further levels of analysis may address pragmatic or contextual aspects of writing. The five layers presented here (topology, visual identity, phonetic, semantic, and style) form a robust foundation for scriptinformatics.

Results

This chapter demonstrates the application of the multilayer symbol model to concrete inscriptions. By analyzing case studies from different writing traditions, the universality of the model can be tested across materials, scripts, and contexts. The following case studies are based on documented inscriptions and selected glyph sets and are intended to demonstrate the qualitative applicability and interpretability of the multilayer symbol model rather than to provide quantitative performance evaluation.

The Székely-Hungarian Rovash inscription is examined in detail to illustrate how each layer of the model contributes to symbol analysis. In addition, two examples from the South Semitic script family are presented: one inscription accompanied by a camel illustration and one Safaitic inscription with a known decipherment. Together, these cases show how the model can be applied both to fully deciphered scripts and to inscriptions where meaning is inferred through contextual or visual evidence.

Purpose of case studies

The purpose of the case studies is twofold. First, they demonstrate how the multilayer symbol model operates in practice, connecting abstract formal definitions with real inscriptions. Second, they highlight the model’s flexibility in handling different types of data: a detailed analysis of a single script (Székely-Hungarian Rovash) and comparative illustrations from another script family (South Semitic). By combining these perspectives, the case studies provide evidence that the model can serve as a general framework for scriptinformatics.

Taken together, these case studies demonstrate how the multilayer symbol model can be applied in practical heritage research scenarios, including the comparison of glyph variants, the interpretation of inscriptions under uncertain linguistic conditions, and the analysis of stylistic variation across materials and cultural contexts.

A Székely-Hungarian Rovash inscription

The following analysis applies the multilayer symbol model introduced in “Methods” to the Székely-Hungarian Rovash inscription. The Székely-Hungarian Rovash script provides a valuable case study due to its complex history and variation⁹. Recent computational approaches have also been applied to the analysis of Rovash inscriptions, revealing structural correspondences and historical relationships among glyph forms¹⁸. Figure 12 introduces the inscription under study.

The analysis proceeds by examining the inscription successively at the topological, visual identity, phonetic, semantic, and stylistic levels.

Following the multilayer symbol model introduced in “Methods,” the Székely-Hungarian Rovash glyphs in the inscription are first analyzed at the topological level by decomposing them into basic structural elements and describing their relations using geometric transformations such as extension, rotation, and shifting. Despite observable structural variation, glyphs exhibiting shared canonical features (such as parallel strokes, diagonal arrangements, and enclosed forms) are grouped at the visual identity level, ensuring consistent recognition across variants. Where phonetic values are attested in the Székely-Hungarian Rovash writing, these canonical classes are associated with corresponding sound units, enabling phonetic interpretation of the inscription. At the semantic level, graphemes combine into meaningful units, allowing interpretation of the linguistic content within its cultural and historical context. Finally, stylistic characteristics are interpreted in relation to material and production techniques, with angular strokes reflecting the constraints of carving into hard surfaces while preserving consistency with canonical Székely-Hungarian Rovash forms.

A South Semitic inscription with a camel illustration

The following analysis applies the multilayer symbol model introduced in “Methods” to a South Semitic inscription accompanied by a camel illustration.

The South Semitic script family provides a useful comparative case due to its different graphical conventions and writing context. The inscription analyzed here is accompanied by a camel illustration, which contributes additional contextual information. Figure 13 presents the inscription and the associated imagery.

At the topological level, the glyphs of the inscription exhibit linear and angular stroke structures characteristic of South Semitic writing, reflecting the constraints of carving on stone surfaces. At the visual identity level, recurring stroke arrangements and relative proportions allow grouping of glyph variants despite minor executional differences. Phonetic interpretation is limited in this case, as the inscription is only partially deciphered; however, the analysis allows tentative association of certain glyph forms with known South Semitic sound values. At the semantic level, the presence of the camel illustration supports an interpretation related to pastoral or nomadic contexts, reinforcing the symbolic meaning of the inscription beyond its textual content. Finally, stylistic variation reflects both the material medium and the local scribal tradition, illustrating how graphical execution interacts with structural identity.

A deciphered example of the South Semitic Safaitic inscriptions

The following analysis applies the multilayer symbol model introduced in “Methods” to a deciphered South Semitic Safaitic inscription. The Safaitic inscription examined here belongs to the South Semitic script family and has a known decipherment, providing an opportunity to evaluate the multilayer model against an established reading. Fig. 14 presents the inscription together with its documented transcription and translation. The inscription reads:

“By Aqraban, son of Kasit, son of Saad, the beautiful woman, playing the reed pipes.”

At the topological level, the Safaitic glyphs exhibit linear and angular stroke configurations consistent with carving on stone surfaces. Visual identity analysis reveals stable canonical features across glyph occurrences, enabling reliable grouping of variants despite minor executional differences. Because the inscription is deciphered, phonetic values can be assigned with greater confidence, allowing direct association between canonical glyph classes and corresponding sound units. At the semantic level, the known reading provides contextual meaning, enabling validation of how glyph combinations form coherent linguistic units. Finally, stylistic analysis reflects local carving practices and material constraints, illustrating how graphical execution varies while preserving symbol identity. Together, these observations demonstrate that the multilayer model can accommodate both undeciphered and deciphered inscriptions within a unified analytical framework.

Discussion

The study introduces a multilayer model of glyph identity and transformation that integrates structural, visual, phonetic, semantic, and stylistic dimensions of writing. This model explains how glyphs can change form while retaining their visual identity and connecting to sound, meaning, and stylistic conventions. By combining these layers, the framework provides a comprehensive foundation for analyzing the complexity of writing systems. It strengthens paleographic research by providing formal methods that support cross-script comparison and computational analysis.

The semantic and style layers are not limited to alphabetic writing systems and are applicable to logographic and mixed scripts, where meaning and graphical convention are conveyed through symbolic form, context, and material practice rather than phonetic realization alone.

Potential applications of the multilayer symbol model include the digital preservation of textual heritage in museum and archival collections, where systematic modeling of symbol variation is required, as well as the restoration and interpretation of damaged or fragmentary inscriptions by distinguishing structural identity from stylistic surface variation.

The Rovash case study offers a detailed demonstration of the multilayer symbol model, illustrating how each layer can be employed to describe, compare, and interpret glyphs. The South Semitic examples broaden the scope of the model to include another script family, one inscribed with imagery, and one that has been fully deciphered. Together, these case studies demonstrate the universality of the multilayer symbol model. The model captures structural transformations, groups symbols by visual identity, assigns phonetic values, situates them semantically, and acknowledges stylistic variation, regardless of the writing system analyzed.

Future work may extend the model to additional scripts, refine its mathematical formalization, and develop digital applications for large-scale manuscript analysis. The present study establishes a theoretical and formal foundation for multilayer symbol analysis; large-scale experimental validation and recognition-based evaluation are beyond its scope and are planned as part of future data-driven research.

Direct quantitative comparison between the proposed multilayer symbol model and existing computational approaches is not yet meaningful. Most current methods in digital paleography are task-specific and operate on isolated representations, such as visual similarity or stroke geometry, whereas the present model defines symbol identity across multiple interconnected semiotic layers.

As a result, numerical benchmarking at this stage would conflate fundamentally different analytical objectives. The case studies presented here, therefore, serve to demonstrate interpretability and conceptual applicability rather than performance metrics. Once the model has been operationalized using artificial neural network (ANN) simulations and suitable datasets have been compiled, subsequent studies will be able to conduct systematic benchmarking against data-driven methods.

The principal advantage of the multilayer symbol model lies in its ability to integrate multiple semiotic dimensions of writing into a single analytical framework. Unlike methods that focus exclusively on geometric similarity or visual pattern recognition, the proposed model explicitly distinguishes between structural form, visual identity, phonetic association, semantic function, and stylistic variation. This separation allows the analysis to remain robust in the presence of scribal variation, material constraints, and historical change.

The model also supports cross-script comparison by isolating script-dependent properties, such as phonetics, from more general structural and visual features. This enables meaningful comparison between unrelated writing systems at the level of symbol structure and transformation rather than surface appearance alone. From a computational perspective, the formal definitions introduced here provide a basis for implementing data-driven simulations, including ANN-based models, in which layer-specific representations can be learned and combined to study symbol evolution, classification, and transformation in a systematic manner. While the present study does not provide numerical validation of the topology layer or the multilayer hierarchy, the framework is formulated to support computational testing, and its effectiveness will be evaluated in future work using data-driven and ANN-based approaches.

Data availability

All data and materials supporting the conclusions of this article are included within the article or are available in the cited sources.

References

Cooper, J. S. Sumerian and Akkadian. In Daniels, P. T. & Bright, W. (eds) The World’s Writing Systems (pp. 37–57) (Oxford University Press, 1996).
Hosszú, G. Heritage of Scribes: The Relation of Rovas Scripts to Eurasian Writing Systems (Rovas Foundation, 2013).
Rogers, H. Sociolinguistic factors in borrowed writing systems. Tor. Work. Pap. Linguist. 17, 247–262 (1999).
Google Scholar
Padma, M. C. & Vijaya, P. A. Script identification of text words from a tri-lingual document using voting technique. Int. J. Image Process. 4, 35–52 (2010).
Google Scholar
Ge, Y., Zhang, Y. & Roh, S.-B. FAIR-net: a fuzzy autoencoder and interpretable rule-based network for ancient Chinese character recognition. Sensors 25, 5928 (2025).
Article PubMed PubMed Central Google Scholar
Wali, A. CA-NN: a cellular automata neural network for handwritten pattern recognition. Nat. Comput. 24, 173–180 (2025).
Article Google Scholar
Tóth, L. L., Pardede, R. E. I., Jeney, G. A., Kovács, F. & Hosszú, G. Preprocessing algorithm for deciphering historical inscriptions using string metric. Int. J. Eng. Technol. Innov. 6, 202–213 (2016).
Google Scholar
Hosszú, G. Validation of graph sequence clusters through multivariate analysis: application to Rovash scripts. Herit. Sci. 12, 110 (2024).
Article Google Scholar
Hosszú, G. Scriptinformatics: Extended Phenetic Approach to Script Evolution (Nap Publishing, 2021).
Sampson, G. Writing Systems (2nd ed.). J. Linguist. 51, 700–704 (2015).
Malatesha, J. R., Aaron P. G. (eds.). Handbook of Orthography and Literacy (Lawrence Erlbaum Associates, 2006).
Salman, O. A., Hosszú, G. & Kovács, F. A new feature selection algorithm for evolutionary analysis of Aramaic and Arabic script variants. Int. J. Intell. Eng. Inform. 10, 313–331 (2022).
Google Scholar
Salman, O. A. & Hosszú, G. A phenetic approach to selected variants of Arabic and Aramaic scripts. Int. J. Data Anal. 3, 19 (2022).
Google Scholar
Salman, O. A. & Hosszú, G. Cladistic analysis of the evolution of some Aramaic and Arabic script varieties. Int. J. Appl. Evolut. Comput. 12, 18–38 (2021).
Article Google Scholar
Puskás, T. & Hosszú, G. A cladistic approach to the evolution of steppe scripts. Int. J. Intell. Eng. Inform. 10, 52–73 (2022).
Google Scholar
Pardede, R. E. I., Tóth, L. L., Jeney, G. A., Kovács, F. & Hosszú, G. Four-layer grapheme model for computational paleography. J. Inf. Technol. Res. 9, 64–82 (2016).
Article Google Scholar
Macdonald, M. C. A. On the uses of writing in ancient Arabia and the role of palaeography in studying them. Arab. Epigr. Notes 1, 1–50 (2015).
Google Scholar
Hosszú, G. & Zelliger, E. Computational paleography of the Bodrog-Alsóbű rovash relic: a reading attempt (in Hungarian) [A Bodrog-alsóbűi rovásemlék számítógépes írástörténeti kapcsolatai és egy olvasati kísérlete]. Magy. Nyelv. J. Soc. Hungarian Linguist. 110, 417–431 (2014). .

Download references

Acknowledgements

The authors declare that no funding was received for the conduct of this research. The authors thank the Department of Electron Devices at Budapest University of Technology and Economics for their institutional support. Figures 13 and 14 are reproduced from Wikimedia Commons under the Creative Commons Attribution Share-Alike 4.0 International License (CC BY-SA 4.0). No changes were made to the original images.

Funding

Open access funding provided by Budapest University of Technology and Economics.

Author information

Authors and Affiliations

Department of Electron Devices, Budapest University of Technology and Economics, Budapest, Hungary
Raymond Pardede & Gábor Hosszú
Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
Ferenc Kovács

Authors

Raymond Pardede
View author publications
Search author on:PubMed Google Scholar
Gábor Hosszú
View author publications
Search author on:PubMed Google Scholar
Ferenc Kovács
View author publications
Search author on:PubMed Google Scholar

Contributions

R.P. conceived the study, developed the multilayer symbol model, and drafted the manuscript. G.H. contributed to the theoretical framework and historical/script analysis and provided supervision. F.K. provided senior guidance, led the revision process, and offered substantive recommendations that refined the analysis, structure, and overall clarity of the article. All authors reviewed and approved the final version of the manuscript.

Corresponding authors

Correspondence to Raymond Pardede, Gábor Hosszú or Ferenc Kovács.

Ethics declarations

Competing interests

The authors declare that they have no competing interests. Author G.H. is Associate Editor of the npj Heritage Science. G.H. was not involved in the journal’s review of, or decisions related to, this manuscript.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pardede, R., Hosszú, G. & Kovács, F. A layered model for glyph identity and transformation in scripts. npj Herit. Sci. 14, 86 (2026). https://doi.org/10.1038/s40494-026-02351-8

Download citation

Received: 17 October 2025
Accepted: 23 January 2026
Published: 06 February 2026
Version of record: 06 February 2026
DOI: https://doi.org/10.1038/s40494-026-02351-8

A layered model for glyph identity and transformation in scripts

Abstract

Similar content being viewed by others

Extreme performance of multi-layer laminated glass designs under blast loads

Stylometric comparisons of human versus AI-generated creative writing

Handwriting identification and verification using artificial intelligence-assisted textural features

Introduction

Methods

Topology layer

Visual identity layer

Phonetic layer

Semantic layer

Style layer

Results

Purpose of case studies

A Székely-Hungarian Rovash inscription

A South Semitic inscription with a camel illustration

A deciphered example of the South Semitic Safaitic inscriptions

Discussion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Search

Quick links

Abstract

Similar content being viewed by others

Extreme performance of multi-layer laminated glass designs under blast loads

Stylometric comparisons of human versus AI-generated creative writing

Handwriting identification and verification using artificial intelligence-assisted textural features

Introduction

Methods

Topology layer

Visual identity layer

Phonetic layer

Semantic layer

Style layer

Results

Purpose of case studies

A Székely-Hungarian Rovash inscription

A South Semitic inscription with a camel illustration

A deciphered example of the South Semitic Safaitic inscriptions

Discussion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links