OracleNet: enhancing Oracle Bone Script recognition with Adaptive Deformation and Texture-Structure Decoupling

Zhou, Shu; Wang, Xin; Qiu, Jingwen; Bu, Wenru; Wang, Hao

doi:10.1038/s40494-025-01839-z

Download PDF

Article
Open access
Published: 17 June 2025

OracleNet: enhancing Oracle Bone Script recognition with Adaptive Deformation and Texture-Structure Decoupling

Shu Zhou^1,2,
Xin Wang³,
Jingwen Qiu^1,2,
Wenru Bu^1,2 &
…
Hao Wang^1,2

npj Heritage Science volume 13, Article number: 273 (2025) Cite this article

2324 Accesses
8 Citations
Metrics details

Abstract

Oracle Bone Script, as the earliest known form of Chinese writing, plays a significant role in archaeological and historical studies due to the importance of recognizing its imagery. However, existing deep learning technologies face challenges in automatically recognizing Oracle Bone Script, including the lack of fine control over local features, the neglect of texture information, and insufficient learning of highly discriminative features. To address these issues, this paper introduces a novel image processing model for Oracle Bone Script named OracleNet. OracleNet consists of an Adaptive Deformation Module, a Texture–Structure Decoupling Module, and a Multi-Level Structured Perceptual Attention Module. The Adaptive Deformation Module enhances local control through adaptive points, maintaining the semantic integrity of the script; the Texture–Structure Decoupling Module distinguishes between texture and structural elements, improving recognition accuracy; the Multi-Level Structured Perceptual Attention Module refines differences through macro and micro perspectives. OracleNet has been validated on multiple datasets, achieving state-of-the-art performance on the Oracle-241, OBC306 and Oracle-MNIST datasets, demonstrating the model’s superior accuracy and robustness.

OraGAN: a deep learning based model for restoring Oracle Bone Script Images

Article Open access 24 December 2025

An open dataset for oracle bone character recognition and decipherment

Article Open access 06 September 2024

A multi-modal dataset and method for bone-level association prediction in oracle bone inscriptions

Article Open access 09 January 2026

Introduction

Oracle Bone Script, the earliest known form of Chinese writing, serves both as an ancient script and a crucial historical artifact for understanding the life and times of the Shang Dynasty (circa 1600–1046 BCE). These inscriptions were meticulously carved on turtle shells and animal bones, making significant contributions to the understanding of the origins of Chinese civilization¹. In recent years, advancements in computer vision technology, particularly through deep convolutional neural networks, have been utilized to decipher these enigmatic scripts, providing invaluable assistance to archaeologists and experts in ancient script studies.

However, automatic recognition of Oracle Bone Script still faces numerous challenges, mainly including: (1) the scarcity of annotated data; (2) the difficulty in effectively separating the complex texture and structural features in Oracle Bone Script images; and (3) the need for highly discriminative features to distinguish visually similar characters and handle issues such as wear and deformation. Firstly, the scarcity of annotated data is one of the primary challenges. Oracle Bone Script characters are extremely rare, and annotating them requires highly specialized knowledge. This process is both expensive and labor-intensive, resulting in a severe limitation of available annotated data. Consequently, traditional supervised learning methods struggle to be effective. To alleviate the lack of data, data augmentation methods have been widely employed. However, traditional data augmentation techniques, such as rotation², scaling³, shearing⁴, and flipping⁵, show significant limitations when processing Oracle Bone Script^6,7. This is because the semantic information of Oracle Bone Script heavily relies on its structural features, such as the shape and relative positions of strokes; simple geometric transformations may disrupt these critical features. To address this, non-rigid data augmentation methods like Free-Form Deformation (FFD)⁸ and Elastic Transformation (ER)⁹ have been proposed. These methods generate new data samples by altering the shape and alignment of images without directly modifying the image content. However, FFD lacks fine control when handling complex local features, potentially failing to accurately preserve the subtle structures of Oracle Bone Script. While ER is effective in adjusting spatial alignment, its fixed regularization parameters cannot adapt to the subtle variations in Oracle Bone Script images, leading to the loss of some local features during augmentation. Additionally, these methods are computationally intensive when processing high-resolution Oracle Bone Script images, making them less suitable for large-scale applications. Secondly, the complex intertwining of texture and structural features in Oracle Bone Script images poses challenges for feature extraction. Oracle Bone Script rubbings (target domain) and reproductions (source domain) exhibit significant differences in texture and structure. Reproduction data have clear glyphs and primarily contain structural information, whereas rubbing data include complex noise such as blurring and degradation, offering rich texture features. If these features are aligned indiscriminately, the abundant texture information may mislead the model, making it difficult to learn domain-invariant features. Most domain adaptation methods^10,11,12 cannot effectively reduce the feature distribution discrepancies between different domains. Some studies¹³ attempt to decouple features to separate texture and structural information but often overlook the role of texture information. For example, UDCN¹⁴ only incorporates structural semantic information into adaptation and neglects the impact of texture information on image wear, stains, and deformation. MixupAda¹⁵ combines mixup data augmentation and adversarial domain adaptation with the aim of improving domain adaptive performance. TransPar¹⁶ is a transformer-based partial-domain adaptive approach that focuses on migrating feature representations shared between source and target domains. FixBi¹⁷ is a bi-directional alignment method aimed at better bridging the feature spaces of source and target domains for effective domain adaptation. PRONOUN¹⁸ use prototypical representations and normalized output conditioners to enhance generalization of models. BSP¹⁹ balance mobility and discriminativity via batch spectrum penalties for adversarial domain adaptation. Lastly, due to different writing styles and the significant variations in strokes and structures of Oracle Bone Script characters, models need to possess highly discriminative feature learning capabilities. Recognizing Oracle Bone Script characters requires not only distinguishing visually similar characters but also handling subtle differences caused by wear and deformation.

To address the aforementioned three challenges, this paper introduces OracleNet. Specifically, we designed three modules in this network: the Adaptive Deformation Module (ADM), the Texture–Structure Decoupling Module (TSDM), and the Multi-Level Structured Perceptual Attention Module (MLSPAM). The ADM enhances fine local deformation control in Oracle Bone Script images by introducing adaptive control points based on FFD, allowing for subtle adjustments that preserve the key structural features of the characters. This approach enables the model to better adapt to complex local features in the image, reducing semantic information loss due to deformation. The TSDM separates texture and structural features within the image through feature decoupling, enabling the model to better understand and utilize the different types of information present in the image. The MLSPAM enhances the model’s perception of structural features through a multi-level attention mechanism. Utilizing self-attention, this module captures crucial information in the image at both macro and micro levels. At the macro level, the attention mechanism helps the model identify key areas and features within the image; at the micro level, it refines these features further, ensuring the model can distinguish visually similar Chinese characters while dealing with subtle differences caused by wear and deformation. The design of the MLSPAM aims to enhance the model’s adaptability to complex Oracle Bone Script images and improve overall recognition performance. With these three modules, OracleNet can not only precisely handle complex deformations and noise in Oracle Bone Script images but also effectively separate and utilize structural and textural features, significantly enhancing the recognition accuracy and robustness of Oracle Bone Script characters.

The contributions of this paper are primarily as follows:

(1) The ADM in our proposed OracleNet utilizes adaptive control points for finer local deformation adjustments in Oracle Bone Script images. These adjustments not only preserve the key structural features of the Oracle Bone Script characters but also adapt to complex local features within the image, thereby reducing the loss of semantic information due to deformation.

(2) The TSDM in OracleNet effectively separates texture and structural features in Oracle Bone Script images through feature decoupling techniques. This separation allows the model to more accurately understand and utilize different information within the image, enhancing its recognition capability for Oracle Bone Script.

(3) The MLSPAM we designed employs an attention mechanism to capture crucial information from both macro and micro levels of the image. This design not only helps the model identify key areas and features within the image but also refines these features further, ensuring the model can distinguish visually similar Chinese characters and handle subtle differences caused by wear and deformation.

(4) Through extensive experimental validation on multiple Oracle Bone Script datasets, OracleNet significantly outperforms existing methods in terms of recognition accuracy and robustness. Our method shows improvements in the accuracy of Oracle Bone Script character recognition by 2.5% on the Oracle-241 dataset, 1.74% on the OBC306 dataset, and 2.07% on the Oracle-MNIST dataset compared to the previous best methods.

Unlike previous methods, this paper utilizes the structural information of the Oracle Bone Script itself (including the shape, length, and relative positions of strokes) combined with textural information (such as cracks and wear marks) for domain adaptation to reduce performance discrepancies when processing different datasets.

Methods

Problem formulation

Given M labeled handprint source domain samples X^s with corresponding labels Y^s, and N unlabeled target domain topographic samples X^t, there exists a significant distribution discrepancy between the source and target domains, i.e., P(X^s) ≠ P(X^t).

The goal of this study is to train a model G that can generalize well to topographic data by training on both the labeled source domain {X^s, Y^s} and the unlabeled target domain {X^t}. Specifically, the aim is to optimize the model G to minimize the prediction error across both domains, expressed as:

$$\mathop{\min }\limits_{G}\Gamma (G;{X}^{s},{Y}^{s},{X}^{t})=\lambda {\Gamma }^{s}(G;{X}^{s},{Y}^{s})+(1-\lambda ){\Gamma }^{t}(G;{X}^{t})$$

(1)

Here Γ^s and Γ^t represent the loss functions for the source and target domains, respectively, and λ is a hyperparameter that balances the importance of the two domains. Thus, the model G is designed to acquire domain-invariant features that are equally effective in both domains, thereby enhancing performance on unseen target domain topographic data.

Overview

As illustrated in Fig. 1, the proposed model, OracleNet, includes three modules: the ADM, the TSDM, and the MLSPAM, aimed at improving the processing effectiveness and generalization ability of Oracle Bone Script images. The ADM employs adaptive control point technology for precise local adjustments to Oracle Bone Script images. Compared to traditional FFD techniques, adaptive control points dynamically adjust based on the content of the image, more accurately handling complex local features and effectively preserving the intricate structures of the Oracle Bone Script while minimizing information loss during deformation.

In the processing of Oracle Bone Script images, distinguishing between structural and textural features is particularly important. The TSDM effectively separates the textural and structural features of the images. This not only enhances the recognition accuracy of structural features but also allows the model to better handle texture noise caused by image wear and degradation. The MLSPAM utilizes a self-attention mechanism to enhance the perception and recognition of Oracle Bone Script images across multiple levels. By applying the attention mechanism at both macro and micro levels, the module not only identifies key areas and features within the images but also refines these features to differentiate visually similar Chinese characters and adapt to minor deformations and wear in the images.

Adaptive Deformation Module

Traditional FFD techniques manipulate an image’s local deformation by setting up a regular grid of control points on the image and moving these control points. Given the special structural features and detail requirements of Oracle Bone Script images, this paper introduces an adaptive control point technique that dynamically adjusts the density and orientation of control points based on the content of the image, thus achieving more precise local deformation control. Details are shown in Fig. 2. At this point, for the handprint source domain samples X^s, the improved FFD transformation function can be expressed as:

$${T}_{{{ENHANCE}}\_{{FFD}}}({X}^{s})={X}^{s}+\mathop{\sum }\limits_{i=1}^{N({X}^{s})}\cdot \Delta {P}_{i}$$

(2)

where N(X^s) is the number of control points based on the handprint source domain sample X^s, and ΔP_i is the displacement vector of control point i.

**Fig. 2: Illustration of Adaptive Deformation.**

The displacement vector ΔP_i for each control point i depends not only on the original position of the control point but also on the feature changes in the surrounding local area. The direction and magnitude of the displacement vector are determined by the following process:

$$\Delta {P}_{i}=f(\nabla I({X}^{s}),{\theta }_{i})$$

(3)

where ∇ I(X^s) represents the image gradient around control point i, and θ_i is an automatically adjusted parameter that modifies the displacement direction and range based on the image content. This adjustment ensures that the movement of control points enhances the structural representation capabilities of the image. θ_i can be automatically computed in following ways:

For handprint source domain samples X^s with clear structures and distinct strokes, this paper utilizes the edge intensity and directionality of the image as the primary features to dynamically compute θ_i:

$$\begin{array}{ll}&{\theta }_{i}({X}^{s})={\alpha }_{1}\cdot EdgeMagnitude({X}^{s})+{\alpha }_{2}\cdot EdgeOrientation({X}^{s})\\ &EdgeMagnitude({X}^{s})=\sqrt{{I}_{x}^{2}+{I}_{y}^{2}}\\ &EdgeOrientation({X}^{s})=arctan({I}_{x}^{2}+{I}_{y}^{2})\\ &{I}_{x}={G}_{x}* {X}^{s}\\ &{I}_{y}={G}_{y}* {X}^{s}\\ \end{array}$$

(4)

where EdgeMagnitude(X^s) and EdgeOrientation(X^s) represent the edge strength and direction near control point i. EdgeMagnitude() quantifies the degree of significant changes in the image near the control points, which is useful for controlling deformations. EdgeOrientation() helps adjust the movement direction of the control points to align with the edge directions in the image, thus maintaining the continuity and integrity of the image structure. I_x and I_y represent the first-order derivatives of the handprint source domain sample near control point i along the x and y directions. G_x and G_y are Sobel operators, commonly used for edge detection by highlighting regions of high spatial frequency that correspond to edges. α₁ and α₂ are coefficients that weigh the contributions of edge magnitude and edge orientation, respectively.

For the target domain topographic samples X^t that include noise and degradation, this paper utilizes local contrast and noise suppression as the primary features to dynamically compute θ_i:

$$\begin{array}{ll}&{\theta }_{i}({X}^{t})={\beta }_{1}\cdot LocalContrast({X}^{t})+{\beta }_{2}\cdot NoiseSuppression({X}^{t})\\ &LocalContrast({X}^{t})=\frac{1}{W\times H}\mathop{\sum}\limits _{x,y}\parallel {X}_{x,y}^{t}-{\mu }_{{{local}}}\parallel \\ &NoiseSuppression({X}^{t})=1-\frac{{\tau }_{{{local}}}}{{\tau }_{{{global}}}}\end{array}$$

(5)

where LocalContrast(X^t) represents the local contrast, emphasizing the visibility of important features in the image; NoiseSuppression(X^t) represents the degree of noise suppression, which helps reduce the impact of noise in the control point displacement; β₁ and β₂ are coefficients that weigh the contributions of local contrast and noise suppression, respectively. ${X}_{x,y}^{t}$ is the pixel value of the image X^t at position (x, y), μ_local is the average pixel value in the area near control point i of X^t, W and H are the width and height of the local window, τ_local is the standard deviation of pixel values in the area near control point i of X^t, τ_global is the standard deviation of pixel values across the entire image X^t.

With the improvements described above, the ADM enables the creation of enhanced samples F^s from the handprint source domain samples X^s and F^t from the unlabeled target domain topographic samples X^t, as follows:

$${F}^{i}=({T}_{{{ENHANCE}}\_{{FFD}}}({X}^{i})),i\in \{s,t\}$$

(6)

This module aids in increasing data diversity and the model’s generalization ability without compromising the original semantic content of the Oracle Bone Script images. By implementing these deformations, the images retain their essential characteristics while adapting to variations that might be encountered in real-world scenarios, thus enhancing the robustness and accuracy of subsequent recognition tasks.

Texture–Structure Decoupling Module

To effectively process Oracle Bone Script images, especially topographic data with complex textures, it is crucial to distinguish between structural features and texture features.

Structural features refer to the intrinsic, shape-related, and semantically meaningful components that constitute the characters themselves. These features are fundamental to character identity and recognition. Specifically, structural features in Oracle Bone Script images primarily include: character shape and outline, stroke composition and arrangement, geometric information, and topological structure. These structural features are identified by focusing on the essential lines and curves that define the character’s form, while minimizing the influence of extraneous elements. Conversely, texture features in Oracle Bone Script images refer to the surface-level visual patterns and variations that are not directly related to the character’s semantic identity or structural form. These features are often domain-specific noise or artifacts introduced by the material properties of oracle bones, the carving process, the aging process, and the image acquisition process. Texture features in Oracle Bone Script images typically include: surface noise and grain, cracks and fissures, stains and discolorations, blurring and degradation, and wear and erosion marks.

Images of Oracle Bone Script typically contains structural features, while topographic data also includes textural features¹⁴. The Texture–Structure Analysis Module aims to extract structural features ${f}_{1}^{t}$ and textural features ${f}_{2}^{t}$ from the enhanced samples of the unlabeled target domain F^t. The core idea of this module is to achieve feature separation by minimizing and maximizing the differences between F^t and the enhanced samples from the handprint source domain F^s.

The structural features of Oracle Bone Script images primarily consist of their shape, outline, and other geometric information. The problem of extracting structural features ${f}_{1}^{t}$ can be expressed as:

$${f}_{1}^{t}=\arg \mathop{\min }\limits_{{f}_{1}^{t}}\parallel S({F}^{t})-S({F}^{s}){\parallel }^{2}$$

(7)

where S represents the operation for extracting structural features. This is because S(F^t) includes both structural and textural features, while S(F^s) contains only structural features. By minimizing the difference described above, the structural features can be effectively extracted.

For the textural features of Oracle Bone Script images ${f}_{2}^{t}$, the problem can be expressed as:

$${f}_{2}^{t}=\arg \mathop{\max }\limits_{{f}_{2}^{t}}\parallel \Gamma ({F}^{t})-\Gamma ({F}^{s}){\parallel }^{2}$$

(8)

where Γ represents the operation for extracting textural features. Contrary to the extraction of structural features ${f}_{1}^{t}$, the textural features ${f}_{2}^{t}$ can be obtained by maximizing the difference. This approach emphasizes distinguishing the unique textural properties found in the target domain samples from those in the source domain samples.

In formulas (7) and (8), S represents an abstract operation for extracting structural features from an Oracle Bone Script image. It is not a single, fixed algorithm, but rather a conceptual representation of the process of isolating and emphasizing the shape-related, semantically meaningful components of the image, while suppressing or ignoring texture-related noise and variations. The implementation techniques for operation S (within the structure branch) in this paper are attention mechanisms (within MLSPAM, applied to structure features), guided by the structure feature loss Loss_structure.

Similarly, Γ represents an abstract operation for extracting texture features. It conceptually aims to isolate and capture the surface-level visual patterns and variations that are distinct from the character’s structural form and semantic content. Like operation S, Γ is not a single algorithm but a representation of the texture feature extraction process within the TSDM. The implementation techniques for operation Γ (within the texture branch) in this paper are network layers trained with texture-focused loss Loss_texture.

In summary, operations S and Γ, as presented in formulas (7) and (8), are conceptual abstractions representing the distinct goals of structural and texture feature extraction within the TSDM. Operation S (structural feature extraction) aims to isolate and emphasize shape-related, semantically meaningful components, while Operation Γ (texture feature extraction) aims to capture surface-level visual patterns and variations distinct from structural form.

Therefore, the structural feature loss Loss_structure can be expressed as:

$$Los{s}_{structure}=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}\parallel {f}_{1}^{s}(i)-{f}_{2}^{t}(i){\parallel }_{2}^{2}$$

(9)

This structural feature loss function is used to minimize the structural differences between enhanced samples from the source domain and the target domain, employing mean squared error to measure the extent of the differences. Here, $\parallel \cdot {\parallel }_{2}^{2}$ represents the squared Euclidean distance, N is the total number of samples, and ${f}_{1}^{s}(i)$ and ${f}_{2}^{t}(i)$ respectively represent the structural features of the ith sample from the source and target domains.

The texture feature loss Loss_texture can be represented as:

$$Los{s}_{texture}=-\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}\frac{{f}_{2}^{s}(i)\cdot {f}_{2}^{t}(i)}{\parallel {f}_{2}^{s}(i){\parallel }_{2}\parallel {f}_{2}^{t}(i){\parallel }_{2}}$$

(10)

This texture feature loss function is designed to maximize the textural differences between the source domain and the target domain. This can be achieved by minimizing a negative loss function, which essentially minimizes the similarity between textural features. Using the negative cosine similarity, it measures the directional differences between feature vectors. Here, ∥ ⋅ ∥₂ represents the Euclidean norm of the vector, ⋅ the dot product, and ${f}_{2}^{s}(i)$ and ${f}_{2}^{t}(i)$ respectively are the textural features of the ith sample from the source and target domains.

Multi-Level Structured Perceptual Attention Module

This module focuses on the multi-level structural features of Oracle Bone Script images, encompassing everything from basic strokes to complex symbol combinations, followed by hierarchical attention learning and integration. Specifically, at the micro-level, it captures edges and basic shapes of the Oracle Bone Script images, while at the macro-level, it concentrates on the overall layout, the combination of symbols, and their interrelations. Details of this module as shown in Fig. 3.

Micro-level feature extraction can be represented as:

$${\phi }_{micro}({F}^{s})=ReLU(Con{v}_{3\times 3}({F}^{s}))$$

(11)

Micro-level features are distinguished by their focus on fine-grained details and local structures within the Oracle Bone Script images. These features are extracted using a 3 × 3 convolutional kernel (Conv3 × 3), which provides a smaller receptive field, enabling the module to capture localized patterns. Specifically, micro-level features primarily capture edges and fine strokes of the characters. They also encompass basic shapes and local patterns that form the fundamental building blocks of the characters. Furthermore, these features are sensitive to details within individual strokes, such as subtle curves, junctions, and variations in stroke width. ReLU, the activation function, helps to highlight these micro-structural features.

Macro-level feature extraction can be represented as:

$${\phi }_{macro}({F}^{s})=ReLU(Con{v}_{5\times 5}(MaxPool({\phi }_{micro}({F}^{s}))))$$

(12)

Macro-level features, in contrast, are characterized by their emphasis on broader contextual information and overall structural patterns. These features are extracted using a 5 × 5 convolutional kernel (Conv5 × 5) followed by Max Pooling, which provides a larger receptive field, allowing the module to capture more global context. Macro-level features capture the overall layout and spatial arrangement of strokes within a character. They encompass combinations of strokes and symbols that form larger semantic units within the character. These features are sensitive to the global shape and overall form of the Oracle Bone Script character, and capture contextual relationships between different parts of the character, providing a holistic view. Here, the 5 × 5 convolutional kernel, Conv5 × 5, helps capture broader structural features, and using Max Pool reduces the dimensionality of the feature maps and increases the receptive field, enhancing the capture of the global information of the image.

In essence, micro-level features focus on local details and fine structures, while macro-level features concentrate on the overall layout and broader contextual information of the Oracle Bone Script characters.

For the micro-level and macro-level features of Oracle Bone Script images, an attention module is designed for each level to learn the importance of the structural features at that level. This can be expressed as:

$$\begin{array}{l}{A}_{micro}({F}^{s})=\sigma (Conv({\phi }_{micro}({F}^{s})))\\ {A}_{macro}({F}^{s})=\sigma (Conv({\phi }_{macro}({F}^{s})))\\ \end{array}$$

(13)

where σ is the sigmoid function, and Conv is a convolution operation used to learn spatial attention from the hierarchical features.

The attention-weighted features from both the micro and macro levels are fused to obtain a comprehensive output feature Total(F^s):

$$\begin{array}{ll}Total({F}^{s})\,={\omega }_{micro}\cdot ({A}_{micro}({F}^{s})\bigodot {\phi }_{micro}({F}^{s}))\\\qquad\qquad\quad +\,{\omega }_{macro}\cdot ({A}_{macro}({F}^{s})\bigodot {\phi }_{macro}({F}^{s}))\\ \end{array}$$

(14)

where ω_micro and ω_macro are the learned weight parameters.

Classification via the addition of a fully connected layer can be represented as:

$${{\Phi }}\,=softmax(W\cdot Total({F}^{s})+b)$$

(15)

where W and b are the weight and bias parameters, respectively.

The classification loss function Loss_category using cross-entropy, can be represented as:

$$Los{s}_{category}=-\mathop{\sum }\limits_{c = 1}^{C}{y}_{c}\log ({{\Phi }})$$

(16)

where C is the total number of categories, y_c indicates whether category c is the correct classification for the sample, and Φ represents the probability of the sample being classified as c.

The domain discrepancy loss function Loss_gap is represented as:

$$Los{s}_{gap}=\parallel Total({F}^{s})-{f}_{1}^{t}{\parallel }_{2}^{2}$$

(17)

where $\parallel\!\! \cdot {\parallel }_{2}^{2}$ denotes the squared Euclidean distance. This loss function aims to minimize the Euclidean distance between the weighted features of the source domain enhanced sample F^s and the structural features ${f}_{1}^{t}$ of the target domain, encouraging the model to find a consistent structural feature representation across the two different domains. This approach effectively reduces the discrepancies in structural features between the source and target domains, enhancing the model’s generalization capabilities on Oracle Bone Script topographic data, thus enabling more accurate classification of unseen oracle bone topographic data.

Total loss function

The total loss function for the paper Loss_Total can be specifically expressed as:

$$Los{s}_{Total}={\lambda }_{1}Los{s}_{structure}+{\lambda }_{2}Los{s}_{texture}+{\lambda }_{3}Los{s}_{gap}+{\lambda }_{4}Los{s}_{category}$$

(18)

where Loss_structure represents the loss associated with structural features within the Texture–Structure Analysis Module, Loss_texture represents the loss associated with textural features within the same module, Loss_gap represents the loss used in domain adaptation methods, and Loss_category represents the classification loss in the MLSPAM. The parameters λ₁, λ₂, λ₃, and λ₄ are weight parameters used to balance the contributions of different losses.

Synergistic operation of OracleNet modules

OracleNet’s strength lies in the synergistic operation of its three core modules: the ADM, the TSDM, and the MLSPAM. These modules are not isolated units but are designed to work in concert, optimizing different aspects of the Oracle Bone Script recognition task in a coordinated manner.

Sequential data flow

The input Oracle Bone Script image first enters the ADM. The ADM adaptively deforms the image, mitigating intra-domain variations and enhancing the prominence of structural features. This deformation process ensures that subsequent modules operate on a structurally refined input. The output from the ADM then flows into the TSDM. The TSDM performs a crucial role in disentangling texture and structural information. By separating these features, the TSDM allows the model to focus on learning domain-invariant structural representations, effectively handling the texture noise inherent in Oracle Bone Script images. Both feature streams from the TSDM are then fed into the MLSPAM. The MLSPAM is designed to extract and refine multi-scale structural features hierarchically. Utilizing attention mechanisms at both macro and micro levels, MLSPAM focuses on the most salient parts of the Oracle Bone Script, further enhancing feature discrimination. Finally, the refined feature representation from the MLSPAM is passed to the classifier to produce the recognition output.

Synergistic optimization through joint training

OracleNet is trained end to end, allowing for the modules to be optimized jointly and interdependently. The total loss function, Loss_Total, orchestrates this joint optimization by combining Loss_Structure, Loss_Texture, Loss_Gap, and Loss_Category. During backpropagation, gradients from Loss_Total flow through the MLSPAM, TSDM, and ADM, guiding the learning process in each module. This joint training approach enables synergistic learning: the ADM learns to provide optimal inputs for the TSDM, the TSDM learns to extract features that best leverage the MLSPAM’s attention mechanism, and the MLSPAM learns to focus on the most discriminative features from the structure-enhanced and texture-decoupled representations.

Complementary roles

Each module plays a complementary role in the overall optimization of OracleNet. The ADM reduces data variance and standardizes input, the TSDM disentangles confounding factors of texture, allowing a focus on structure, and the MLSPAM provides refined, multi-scale feature analysis. This carefully orchestrated modular design, combined with joint end-to-end training, is what enables OracleNet to achieve superior performance in Oracle Bone Script recognition.

Results

Datasets

The Oracle-241 dataset¹³ contains approximately 80,000 images covering 241 categories of handprint and topographic Oracle Bone Script characters, used for unsupervised domain adaptation tasks. The dataset is divided into training and test sets, with the training set comprising 10,861 labeled handprint data and 50,168 unlabeled topographic data; the test set includes 3730 handprint data and 13,806 topographic data. The Oracle Bone Script images in the Oracle-241 dataset exhibit extremely severe and unique noise due to long periods of burial and careless excavation, and most categories in the dataset feature multiple writing styles, which increases the difficulty of recognition and adaptation, as shown in Fig. 4.

**Fig. 4: Examples in Oracle-241 dataset (left), OBC306 dataset (right) and Oracle-MNIST dataset (bottom).**

The OBC306 dataset²⁰ is currently the largest Oracle Bone Script dataset, containing 309,551 samples divided into 306 categories, each representing a unique Oracle Bone Script character, used for pattern classification benchmark testing. As shown in Figure 4, all samples in the OBC306 dataset are extracted from real Oracle Bone Script shards. The training and testing set ratio is divided into 3:1.

The Oracle-MNIST dataset²¹ consists of 28 × 28 grayscale images containing 30,222 ancient characters across 10 categories, used for pattern classification benchmark testing, especially for challenges related to image noise and distortion. The training set includes 27,222 images, with each category in the test set containing 300 images. The training and testing set ratio is divided into 4:1.

Implement detail

Model

The width and height of the handprint source domain X^s and the target domain topographic samples X^t are both 224 pixels, and each Oracle Bone Script image has 3 color channels. In the ADM, the number of enhanced samples from the handprint source domain F^s and the target domain topographic F^t is set at 20, allowing for increased sample diversity while not introducing excessive computational overhead. The displacement of control points is set to 17, with 19 adaptive control points, ensuring fine adjustments at details while maintaining the stability of the overall structure. The coefficients for edge magnitude (α₁), edge direction (α₂), local contrast (β₁), and noise suppression (β₂) are selected as 0.53, 0.62, 0.74, and 0.43. In micro-feature extraction, 2 convolutional layers are set, each with 32 filters of size 3 × 3 and a stride of 1. In macro-feature extraction, 4 convolutional layers are set with 64, 128, 256, and 512 filters, respectively, of size 5 × 5 and a stride of 1, with max pooling layers using a 2 × 2 window and a stride of 2. Hidden layers all use ReLU. The output of the attention layers uses the sigmoid function.

Experiment setup

The initial learning rate for Adam is set at 0.001, with the learning rate decreasing by a factor of 0.1 every 10,000 iterations. We conducted 90,000 iterations of data training with a batch size of 64. The weights for the loss functions λ₁, λ₂, λ₃, and λ₄ are set to 0.48, 0.47, 0.54, and 0.52. All experiments were conducted on a single NVIDIA GeForce RTX 3090 GPU.

Evaluation

We set the validation standards according to the criteria established in this paper²². All labeled source characters and all unlabeled target characters are used for training, and the average classification accuracy and standard deviation are reported based on three random experiments.

Comparison of other methods

Results on Oracle-241

Table 1 displays the results on the Oracle-241 dataset to evaluate the effectiveness of the methods described in this paper for transferring recognition knowledge from Oracle Bone Script characters to topographic data. This paper also includes the “Source-only” model as NN − DML (2019)¹⁵, which is used because it trains the model solely on Oracle Bone Script handprint data without any adaptation. The other models^{13,14,15,16,17,18,19,23,24} not only train on Oracle Bone Script handprint data but also undergo adaptation.

Table 1 Source (Handprint, corresponding to “Handprinted Character” in STSN¹³) and target (Topographic, corresponding to “Scanned Data” in STSN¹³) accuracy (mean ± std%) on Oracle-241 dataset

Full size table

From the results in Table 1, the following conclusions can be drawn: (1) Without adaptive adjustments, the “Source-only” model does not perform well on topographic data, demonstrating the distribution differences between handprint data and topographic data. (2) STSN is the first work to focus on identifying domain gaps in Oracle Bone Script, achieving good results due to its design of joint and transformation modules; UDCN performs well in the target domain due to its unsupervised discriminative learning design. (3) The model presented in this paper, OracleNet, achieves the best accuracy on both topographic data and handprint source data. This is because the ADM increases the fault tolerance of the data samples, the Texture–Structure Analysis Module separates and processes structural and textural features aiding the model in obtaining more characteristics, and the MLSPAM implements hierarchical attention learning and integration at both macro and micro levels, enabling precise localization of the model.

Results on OBC306

On the OBC306 dataset, this paper primarily compares the source accuracy and target accuracy among various models, including SADE²⁵, ResLT²⁶, PaCO²⁷, and MixupAda²⁸. The performances of these models on the OBC306 dataset are shown in Table 2.

Table 2 Source and target accuracy (mean ± std%) on OBC306 dataset

Full size table

From the results in Table 2, the following conclusions can be drawn: (1) MixupAda shows good results because it combines the complementarities between adversarial data augmentation and a hybrid generator. (2) The reason this paper presents the best results on the OBC306 dataset is that the ADM generates high-quality samples.

Results on Oracle-MNIST

On the Oracle-MNIST dataset, this paper mainly compares the average accuracy and overall accuracy among models including VGG-16²⁹, AlexNet³⁰, ResNet50³¹, Inception-V3³², and LR-Net³³.

From the results in Table 3, the following conclusions can be drawn: (1) LR-Net achieves good results because it classifies images with higher confidence. (2) OracleNet achieves the best performance, which is attributed to the effective separation of extensive noise present in the Oracle-MNIST dataset by the Texture–Structure Analysis Module within the model.

Table 3 Source and target accuracy (mean ± std%) on Oracle-MNIST dataset

Full size table

Ablation study

To better understand the contribution of each module within the model, we conducted an ablation study, systematically removing or isolating specific modules and observing their impact on model performance. The focus of the ablation study was on the ADM, TSDM, and MLSPAM. Table 4 summarizes the study results, highlighting the importance of each component in achieving high performance and effective feature integration.

Table 4 Ablation study on Oracle-241 dataset

Full size table

(1) Effectiveness of ADM (Model-A vs. Full Model). We first created Model-A, which does not utilize the ADM, to explore the role of this module. The comparison between Model-A and the Full Model shows a significant decline in performance, with a particularly notable decrease in accuracy by 12.4%. This result highlights the critical role of this module in enhancing data from the source and target domains. It adjusts source domain samples to be more similar to target domain samples through data augmentation methods, thereby enhancing the model’s generalization ability across different domains.

(2) Importance of TSDM (Model-B vs. Full Model). To evaluate the impact of the TSDM, Model-B was configured without this module. Compared to the Full Model, the absence of this module resulted in a performance decrease of 9.7%. This demonstrates that the module’s effective ability to separate and recombine structural and textural features is crucial for handling the complex nature of Oracle Bone Script. It also confirms its necessity for accurate feature extraction and domain adaptation.

(3) Contribution of MLSPAM (Model-C vs. Full Model). In Model-C, we removed the MLSPAM to isolate its impact. The performance of this model dropped by 16.8%, highlighting the importance of this module in focusing on features of varying scales and complexities. The attention mechanism of this module enhances the model’s sensitivity to key structural details, which are crucial for the classification and interpretation of Oracle Bone Script.

(4) Integrated Analysis of All Modules (Model-D vs. Model-C vs. Full Model). We also compared Model-D (without the ADM and TSDM) and Model-C (without the MLSPAM) with the Full Model to examine the interactions between these modules. The results show that while each module plays a significant role individually, their integration synergistically enhances performance. Model-C, lacking only the attention module, performed better than Model-D, but still underperformed compared to the Full Model, highlighting the compounded advantages of multiple modules working together.

Sensitivity analysis

In this section, we will explore the sensitivity of key parameters within the model to understand their impact on performance and to determine optimal settings. The sensitivity analysis is divided into two parts: parameters of the ADM and weight parameters in the total loss function.

Sensitivity analysis of N(X ^s), ΔP _i, and other parameters in ADM

We analyzed the sensitivity of the number of adaptive control points N(X^s) and the displacement vectors ΔP_i of control points i and other parameters.

(1) N(X^s): As illustrated in Fig. 5, the accuracy of the model initially increases with the number of control points N(X^s). When the number of control points is low, the model’s accuracy rapidly improves, indicating that increasing the number of control points significantly enhances the model’s ability to capture details in Oracle Bone Script images, thus improving recognition performance. However, as the number of control points continues to increase beyond a certain threshold, the increase in accuracy begins to slow down and starts to decline when it reaches about 20 control points. This trend reveals the nonlinear impact of increasing the number of control points on model performance.

The reason is that when there are fewer control points, each control point covers a larger area of the image, which limits the model’s ability to deform at detailed locations. In Oracle Bone Script images, many details such as the beginning and end of strokes and changes in angles are crucial for correct identification; therefore, an initial increase in control points significantly boosts recognition accuracy. As the number of control points increases, each point controls a smaller area of the image, allowing for finer deformations, but this may also lead to overfitting to local features while neglecting the overall structure, especially when there are too many control points. Overemphasis on irrelevant details (such as image noise or non-structured background parts) may affect the model’s generalization ability. Additionally, the increase in computational cost is a disadvantage of having too many control points.

(2) Δ(P_i): As shown in Fig. 5, adjusting the size of the displacement vectors directly affects the magnitude of deformation. Smaller displacement vectors may lead to insufficient adjustments, failing to overcome differences between domains, while overly large displacement vectors can cause excessive distortion of image details, reducing the model’s recognition accuracy. Experiments have shown that adjusting the displacement vectors within a specific range can achieve optimal model performance.

(3) α₁, α₂, β₁, β₂: As shown in Fig. 6, adjustments to α₁, α₂, β₁, β₂ influence accuracy.

For α₁ (edge magnitude coefficient), the increase initially enhances classification accuracy as the model more clearly identifies image edges and contours, which are crucial for recognizing structural features of Oracle Bone Script. However, when α₁ becomes too high, accuracy may decline due to the overemphasis on edges potentially increasing image noise and affecting model generalization. An optimal value of α₁ helps the model retain important structural information while avoiding noise and non-structural information interference. Excessive α₁ might misinterpret minor or irrelevant variations as structural features, leading to classification errors.

For α₂ (edge direction coefficient), increasing α₂ also improves accuracy within a certain range because it helps the model more accurately capture edge directions, crucial for analyzing the shape and orientation of characters. However, too high α₂ may make the model overly sensitive to image details, especially in edge directions, potentially disrupting the correct understanding of the overall structure. Proper adjustment of α₂ optimizes the model’s interpretation of Oracle Bone Script stroke directions, aiding in the accuracy of character recognition. Overemphasis on edge directions might make the model too sensitive to natural deformations or slight distortions in the image, leading to misjudgments.

For β₁ (local contrast coefficient), increasing β₁ initially enhances model performance by emphasizing local contrast in the image, helping the model more clearly distinguish between characters and the background. However, if β₁is too high, it may lead to overly strong contrast in local areas of the image, obscuring subtle structural details and reducing accuracy. An appropriate β₁ helps the model identify key structural features in Oracle Bone Script images, especially under uneven lighting or varying image quality conditions. Uncontrolled contrast enhancement might cause image details to distort, especially in tiny spaces between characters, which could be incorrectly filled.

For β₂ (noise suppression coefficient), increasing β₂ helps reduce the impact of noise in the image, initially positively affecting accuracy. However, excessive noise suppression might lead to the loss of important details, especially where minor strokes and cracks in Oracle Bone Script may contain crucial information. Over-suppressing noise can reduce the model’s classification accuracy. Noise suppression controlled by β₂ must balance between reducing noise and preserving important image details. Oracle Bone Script images often contain natural noise due to aging and damage; reducing this noise moderately can clarify the image, but excessive noise suppression might erase subtle traces carrying historical information.

Sensitivity analysis of λ ₁, λ ₂, λ ₃, and λ ₄ in total loss

The weights λ₁, λ₂, λ₃, and λ₄ balance the contributions of different components within the total loss function (Fig. 7). We methodically adjust these weights to study their impact on the training dynamics and the output of the model.

(1) Weight variations: Each λ is varied independently while keeping the baseline values of the other weights constant, to isolate their impacts. Weights λ₁ and λ₂, which are associated with the alignment of structural and textural features respectively, exhibit a bell-shaped influence on performance as they increase from 0.1 to 1.0. An optimal range emerges where the model achieves the best balance between alignment and overfitting. In contrast, λ₃ and λ₄, which are related to domain adaptability and classification accuracy, show less sensitivity in the medium range, but become crucial when deviating significantly from this range.

(2) Performance impact: We found that increasing λ₃ slightly improves domain adaptability when set at lower values, but setting it too high can decrease overall accuracy, indicating a need to balance between specific domain adaptability and generalization. The impact of λ₄ on classification performance is the most direct; it remains stable over a broad range but shows a significant decrease when either too high or too low, reflecting its direct influence on classification loss.

Through sensitivity analysis, we have identified the optimal settings for model parameters during practical deployment. These settings help the model automatically adjust its processing strategies when facing different types of Oracle Bone Script images, thus enhancing overall recognition accuracy and robustness.

Image visualization

To fully demonstrate the effectiveness and interpretability of our model, we have introduced several visualization methods to elucidate the model’s performance and its ability to handle complex transformations related to Oracle Bone Script. Here, the visualization effects of three modules are displayed.

(1) ADM: As shown in Fig. 8, the images before and after applying the ADM are displayed. The left side shows the original facsimile example before processing, and the right side shows the results of Adaptive Deformation after applying adaptive control points. Through this deformation, local features of the image (such as stroke bending, symbol spacing) are fine-tuned to better mimic the natural morphological changes that may be encountered in topographic samples.

**Fig. 8: Visualization of Adaptive Deformation Module on Oracle-241 dataset.**

This visualization helps understand how the module adjusts the image to fit the specific deformations of the target domain while preserving key structural details. This is crucial for enhancing the model’s adaptability and recognition accuracy on real-world data, especially when there is a significant morphological difference between the target and source domains.

(2) TSDM: As shown in Fig. 9, the effects of the TSDM are displayed. In each set of three-column images, the left side shows the original image, the middle shows the extracted structural features, and the right side shows the extracted texture features. These comparative images demonstrate how the module effectively separates texture information (such as cracks and surface wear) from structural information (such as character edges and strokes).

**Fig. 9: Visualization of Texture–Structure Decoupling Module on Oracle-241 dataset.**

The separation of these features is crucial for enhancing the model’s ability to recognize characters against complex backgrounds. Decoupling texture from structure not only aids the model in more accurately identifying and interpreting Oracle Bone Script but also makes it more robust when dealing with variously degraded or damaged artifacts.

(3) MLSPAM: As shown in Fig. 10, the impact of the attention mechanism on the feature maps is displayed. In each set of three columns, the left side shows the original image, the middle column displays the effects of micro-level attention, and the right side shows macro-level attention. These images demonstrate how the attention mechanism focuses on key features within the image at different levels, such as the fine details of strokes and the overall layout of characters. This multi-level approach ensures that both detailed and global aspects of the characters are adequately emphasized, improving the model’s ability to interpret complex images accurately.

**Fig. 10: Visualization of Multi-Level Structured Perceptual Attention Module on Oracle-241 dataset.**

The visualization of this module vividly illustrates how focusing on different levels of detail enhances the model’s understanding of Oracle Bone Script symbols. By adjusting the attention mechanism, the model is able to recognize and emphasize details crucial for classification, thereby maintaining high accuracy in complex or blurry images. This layered attention strategy is key to enhancing the model’s ability to deeply analyze image content.

These visualizations not only validate the model’s capability to handle complex transformations specific to the unique features of Oracle Bone Script but also demonstrate how each module contributes to comprehensively improving image quality and readability. This holistic approach ensures that the model is not only effective in identifying and interpreting the scripts but is also robust against the variations and imperfections commonly found in historical artifacts.

Feature visualization

To evaluate the effectiveness of our proposed model in mitigating domain discrepancies and enhancing feature discriminability, we utilize t-SNE to visualize features extracted from the source domain and adapted target domain of the Oracle Bone Script dataset. As shown in Fig. 11, we present a comparison of the feature distributions before and after domain adaptation. This visualization helps in assessing how well the model has aligned the features from both domains, crucial for ensuring that the learning transfers effectively across different data conditions and enhances the model’s ability to generalize to new, unseen data while maintaining high accuracy and robustness.

Before domain adaptation, the features extracted by the source model displayed a relatively dispersed pattern, indicating a significant domain shift between the source and target domains. However, after adjustments made using our proposed domain adaptation approach, features from different domains exhibited a more mixed and indistinguishable trend, powerfully demonstrating the effectiveness of our proposed method in reducing domain discrepancies. This blending of features from diverse domains not only validates the adaptability of the model but also its potential in handling domain-specific variations, crucial for real-world applications where domain variability is common.

Error analysis

In this section, we explore the scenarios in which our model, which integrates the ADM, TSDM, and MLSPAM, fails to accurately classify Oracle Bone Script images. Although our model generally performs better than traditional “Source-only” models, it is not without its shortcomings, particularly when dealing with complex artificial marks in images.

(1) ADM limitations. While the ADM is designed to more closely align source domain images with target domain images, it occasionally causes misalignment of features crucial for correct classification. For instance, some characters may undergo excessive deformation, resulting in the loss of key structural details needed to distinguish similar characters. As shown in Fig. 12a, this sometimes leads to misclassification of characters with subtle differences.

**Fig. 12: Error analysis of our model.**

(2) TSDM challenges. The TSDM effectively separates texture from structural information, which usually aids the recognition process. However, this module may struggle with images where texture features are severely degraded due to noise. As depicted in Fig. 12b, in such cases, the module may fail to accurately reconstruct fundamental structural information, leading to errors when recognizing characters that heavily rely on these details.

(3) MLSPAM errors. This module is designed to focus on relevant features across multiple scales to enhance the model’s ability to discriminate between different character categories. However, if there is a high level of blurring or similar distortions within the same category, this module can sometimes be overwhelmed. These situations can cause the attention mechanism to focus on incorrect aspects of the image, thus hindering accurate classification, as described in Fig. 12c.

(4) Comparison with similar correct cases. Figure 12d shows correctly classified examples of similar characters, where the structural features are more discernible despite the presence of noise, allowing the model to correctly identify it.

(5) General observations. Over 50% of topographic images exhibit severe deformations and noise, which still pose a challenge for our model. Although the ADM adjusts shapes and alignment, and the TSDM attempts to clarify the distinction between texture and structural features, the presence of blurred and obstructed images can lead to serious classification errors. Additionally, the similarity between certain characters and the appearance of characters as sub-components in others can also cause model confusion, thereby exacerbating the error rate.

Performance across varying training data scales

To assess OracleNet’s robustness under data scarcity, we conducted comparative experiments on the Oracle-241 dataset using varying percentages of its available training data. As shown in Table 5, recognition accuracy generally improved with more training data, demonstrating the model’s capacity to leverage increased supervision. Specifically, when utilizing only 25% of the training data, OracleNet achieved a target accuracy of 52.1 ± 0.5%. This performance progressively rose to 60.3 ± 0.4% with 50% of the data, and further to 63.5 ± 0.3% when 75% of the data was used. At 100% of the training data, OracleNet maintained its established high performance of 64.7 ± 0.3%. These results indicate that while more data naturally leads to better performance, OracleNet exhibits promising capabilities even with limited training resources, which is crucial for real-world historical script applications where data annotation is often scarce.

Table 5 OracleNet performance with varying training data scales on Oracle-241 dataset (target accuracy)

Full size table

Robustness to varying image degradation levels

To explicitly quantify OracleNet’s resilience to image complexity, noise, and blurring—common issues in Oracle Bone Script images—we evaluated its performance on the Oracle-MNIST dataset under controlled synthetic degradation (Table 6). We introduced increasing levels of Gaussian blur noise to the test set, measuring the impact on target accuracy. With no degradation, OracleNet achieved 97.2 ± 0.2%. Under low, medium, and high levels of Gaussian blur, the accuracy gracefully declined to 96.5 ± 0.2%, 95.1 ± 0.3%, and 93.2 ± 0.5%, respectively. This progressive yet controlled reduction in accuracy highlights OracleNet’s robustness, demonstrating its ability to maintain strong recognition capabilities even when images are significantly compromised by the types of degradation often encountered in historical artifacts, largely attributed to the ADM and TSDM.

Table 6 OracleNet robustness to synthetic image degradation on Oracle-MNIST dataset (target accuracy)

Full size table

Discussion

While OracleNet demonstrates significant advancements in Oracle Bone Script recognition, it is important to acknowledge its limitations and potential areas for future improvement.

Firstly, limitations in robustness to specific noise or deformations: While OracleNet exhibits robustness to various types of noise and deformation, it may still face challenges with extremely severe or specific types of noise or distortions not well represented in the training data. As observed in the error analysis, excessive blurring or highly unusual deformation patterns can still lead to misclassification. Enhancing robustness to these extreme conditions remains a direction for future research.

Secondly, challenges in distinguishing visually similar characters: Despite the MLSPAM, discriminating between visually highly similar Oracle Bone Script characters remains a persistent challenge. Subtle visual differences between closely related character categories, especially under image degradation, can still be difficult for the model to discern. Future work could explore incorporating finer-grained feature extraction and contrastive learning approaches to address this limitation.

Addressing these limitations will be the focus of our future research, aiming to further enhance the robustness, and accuracy of Oracle Bone Script recognition models.

In this study, we introduced OracleNet, a novel approach to Oracle Bone Script recognition, integrating the ADM, TSDM, and MLSPAM. OracleNet achieves significant performance improvements, demonstrating superior accuracy and robustness in Oracle Bone Script recognition. Specifically, OracleNet achieves state-of-the-art performance on the Oracle-241, OBC306 and Oracle-MNIST datasets.

These performance gains are attributed to the innovative design of OracleNet, which integrates three key modules: the ADM, enabling finer local deformation control and preserving semantic integrity; the TSDM, effectively separating texture and structural features to enhance recognition accuracy; and the MLSPAM, refining feature discrimination through macro and micro perspectives. The synergistic interaction of these modules allows OracleNet to effectively address the challenges of Oracle Bone Script recognition, including complex deformations, noise, and subtle structural variations. Extensive experimental results and ablation studies validate the effectiveness and contribution of each module in OracleNet.

Beyond improving recognition accuracy, the practical deployment of OracleNet in cultural heritage protection faces several key challenges. Technically, the model must contend with extreme and novel forms of degradation not fully captured in existing datasets, as well as the fine differentiation of highly similar characters that possess minimal structural variance. Non-technical challenges are equally critical: ethically, the application of AI must prioritize the integrity and authenticity of cultural artifacts, ensuring that automated interpretations supplement, rather than supplant, expert human scholarship, and avoid any form of misrepresentation. This necessitates transparent and explainable AI systems. Furthermore, for effective integration into archaeological and historical workflows, careful consideration must be given to the user interaction experience. Future developments should focus on creating intuitive interfaces that allow cultural heritage professionals to easily input data, review recognition results, provide feedback for refinement, and visualize the model’s decision-making process. Such user-centric design and a clear ethical framework are paramount to realizing the full practical and social value of OracleNet in aiding the preservation and study of Oracle Bone Script.

Looking ahead, future research directions include further enhancing the robustness of OracleNet to extreme levels of noise and deformation, particularly those not well-represented in current datasets; exploring the application of OracleNet or its modular design to other historical text recognition tasks, such as ancient scripts from other cultures or degraded document image; and investigating model compression techniques to facilitate deployment in resource-constrained environments. These directions aim to further improve the accuracy, robustness, and applicability of Oracle Bone Script recognition technology, contributing to the advancement of archaeological and historical studies.

Data availability

The datasets used during the current study are available from http://jgw.aynu.edu.cn/.

Code availability

The code for OracleNet model and the experimental scripts are publicly available at https://github.com/cooljeremy/OracleNet.

References

Flad, R. K. Divination and power: a multiregional view of the development of oracle bone divination in early China. Curr. Anthropol. 49, 403–437 (2008).
Article Google Scholar
Zhao, G., Ahonen, T., Matas, J. & Pietikainen, M. Rotation-invariant image and video description with local binary pattern features. IEEE Trans. Image Process. 21, 1465–1477 (2011).
Article PubMed Google Scholar
Wang, Y.-S., Tai, C.-L., Sorkine, O. & Lee, T.-Y. Optimized scale-and-stretch for image resizing. In ACM SIGGRAPH Asia 2008 Papers 1–8 (ACM, 2008).
Sasada, M. & Togashi, T. Measurement of rollover in double-sided shearing using image processing and influence of clearance. Proc. Eng. 81, 1139–1144 (2014).
Article Google Scholar
Andersson, P. et al. Flip: a difference evaluator for alternating images. Proc. ACM Comput. Graph. Interact. Tech. 3, 15–1 (2020).
Article Google Scholar
Wang, M. et al. Study on the evolution of Chinese characters based on few-shot learning: from Oracle bone inscriptions to regular script. PLoS ONE 17, e0272974 (2022).
Article PubMed PubMed Central CAS Google Scholar
Chen, T. et al. A study on encoding-based Oracle bone script recognition. J. Chin. Writ. Syst. 4, 281–290 (2020).
Article Google Scholar
Sederberg, T. W. & Parry, S. R. Free-form deformation of solid geometric models. In Proc. 13th Annual Conference on Computer Graphics and Interactive Techniques 151–160 (ACM, 1986).
Periaswamy, S. & Farid, H. Elastic registration in the presence of intensity variations. IEEE Trans. Med. Imaging 22, 865–874 (2003).
Article PubMed Google Scholar
Wang, X., Ye, Y. & Gupta, A. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6857–6866 (IEEE, 2018).
Ganin, Y. & Lempitsky, V. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning 1180–1189 (PMLR, 2015).
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D. & Erhan, D. Domain separation networks. In Advances in Neural Information Processing Systems 29 (NIPS, 2016).
Wang, M., Deng, W. & Liu, C.-L. Unsupervised structure-texture separation network for Oracle character recognition. IEEE Trans. Image Process. 31, 3137–3150 (2022).
Article PubMed Google Scholar
Wang, M., Deng, W. & Su, S. Oracle character recognition using unsupervised discriminative consistency network. Pattern Recognit. 148, 110180 (2024).
Article Google Scholar
Zhang, Y.-K., Zhang, H., Liu, Y.-G., Yang, Q. & Liu, C.-L. Oracle character recognition by nearest neighbor classification with deep metric learning. In 2019 International Conference on Document Analysis and Recognition (ICDAR) 309–314 (IEEE, 2019).
Han, Z., Sun, H. & Yin, Y. Learning transferable parameters for unsupervised domain adaptation. IEEE Trans. Image Process. 31, 6424–6439 (2022).
Article PubMed Google Scholar
Na, J., Jung, H., Chang, H. J. & Hwang, W. Fixbi: dridging domain spaces for unsupervised domain adaptation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 1094–1103 (IEEE, 2021).
Hu, D., Liang, J., Hou, Q., Yan, H. & Chen, Y. Adversarial domain adaptation with prototype-based normalized output conditioner. IEEE Trans. Image Process. 30, 9359–9371 (2021).
Article PubMed Google Scholar
Chen, X., Wang, S., Long, M. & Wang, J. Transferability vs. discriminability: batch spectral penalization for adversarial domain adaptation. In International Conference on Machine Learning 1081–1090 (PMLR, 2019).
Li, J., Wang, Q.-F., Zhang, R. & Huang, K. Mix-up augmentation for Oracle character recognition with imbalanced data distribution. In Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Proceedings, Part I 237–251 (Springer, 2021).
Wang, M. & Deng, W. A dataset of Oracle characters for benchmarking machine learning algorithms. Sci. Data 11, 87 (2024).
Article PubMed PubMed Central Google Scholar
Long, M., Cao, Y., Wang, J. & Jordan, M. Learning transferable features with deep adaptation networks. In Proc. 32nd International Conference on Machine Learning (eds Bach, F. & Blei, D.) 97–105 (PMLR, 2015).
Long, M., Cao, Z., Wang, J. & Jordan, M. I. Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems 31 (NIPS, 2018).
Xie, S., Zheng, Z., Chen, L. & Chen, C. Learning semantic representations for unsupervised domain adaptation. In International Conference on Machine Learning 5423–5432 (PMLR, 2018).
Zhang, Y., Hooi, B., Hong, L. & Feng, J. Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition. Adv. Neural Inf. Process. Syst. 35, 34077–34090 (2022).
Google Scholar
Cui, J., Liu, S., Tian, Z., Zhong, Z. & Jia, J. Reslt: residual learning for long-tailed recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3695–3706 (2022).
Google Scholar
Cui, J., Zhong, Z., Liu, S., Yu, B. & Jia, J. Parametric contrastive learning. In Proc. IEEE/CVF International Conference on Computer Vision 715–724 (IEEE, 2021).
Li, J. et al. Towards better long-tailed Oracle character recognition with adversarial data augmentation. Pattern Recognit. 140, 109534 (2023).
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS, 2012).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2818–2826 (IEEE, 2016).
Ganj, A., Ebadpour, M., Darvish, M. & Bahador, H. LR-Net: a block-based convolutional neural network for low-resolution image classification. Iran. J. Sci. Technol. Trans. Electr. Eng. 47, 1561–1568 (2023).
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 72074108) and Fundamental Research Funds for the Central Universities at Nanjing University (Grant No. 010814370338), Jiangsu Young Talents in Social Sciences, and Tang Scholar of Nanjing University.

Author information

Authors and Affiliations

School of Information Management, Nanjing University, Nanjing, China
Shu Zhou, Jingwen Qiu, Wenru Bu & Hao Wang
Jiangsu International Joint Informatics Laboratory, Nanjing University, Nanjing, China
Shu Zhou, Jingwen Qiu, Wenru Bu & Hao Wang
Baidu Inc., Beijing City, China
Xin Wang

Authors

Shu Zhou
View author publications
Search author on:PubMed Google Scholar
Xin Wang
View author publications
Search author on:PubMed Google Scholar
Jingwen Qiu
View author publications
Search author on:PubMed Google Scholar
Wenru Bu
View author publications
Search author on:PubMed Google Scholar
Hao Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Shu Zhou was primarily responsible for conceptualization, methodology, writing (original draft), research, software, and formal analysis. Xin Wang performed data collection, data analysis and thesis checking and revision. Jingwen Qiu and Wenru Bu were responsible for writing the paper. Hao Wang conducted mentoring and fund provision.

Corresponding author

Correspondence to Hao Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, S., Wang, X., Qiu, J. et al. OracleNet: enhancing Oracle Bone Script recognition with Adaptive Deformation and Texture-Structure Decoupling. npj Herit. Sci. 13, 273 (2025). https://doi.org/10.1038/s40494-025-01839-z

Download citation

Received: 02 December 2024
Accepted: 30 May 2025
Published: 17 June 2025
Version of record: 17 June 2025
DOI: https://doi.org/10.1038/s40494-025-01839-z

This article is cited by

Human–computer collaborative approach to the decipherment of racle bone inscriptions with generative adversarial networks
- Shaoting Zeng
- Jiayue Bai
- Yifeng Shi
npj Heritage Science (2026)
Prism-OBI: a novel framework for oracle bone inscription recognition via visual perception and feature decoupling
- Jia Wen Li
- Jia Rui He
- Rong Jun Chen
npj Heritage Science (2026)
Adaptive multi-feature fusion for visible-infrared image registration and character enhancement of bamboo slips
- Teng Wan
- Fengchen Qi
- Shaoyi Du
npj Heritage Science (2026)
Visualizing poetry with deep semantic understanding and consistency evaluation
- Churuo Xu
- Shu Zhou
npj Heritage Science (2025)

Abstract

Similar content being viewed by others

OraGAN: a deep learning based model for restoring Oracle Bone Script Images

An open dataset for oracle bone character recognition and decipherment

A multi-modal dataset and method for bone-level association prediction in oracle bone inscriptions

Introduction

Methods

Problem formulation

Overview

Adaptive Deformation Module

Texture–Structure Decoupling Module

Multi-Level Structured Perceptual Attention Module

Total loss function

Synergistic operation of OracleNet modules

Sequential data flow

Synergistic optimization through joint training

Complementary roles

Results

Datasets

Implement detail

Model

Experiment setup

Evaluation

Comparison of other methods

Results on Oracle-241

Results on OBC306

Results on Oracle-MNIST

Ablation study

Sensitivity analysis

Sensitivity analysis of N(X s), ΔP i, and other parameters in ADM

Sensitivity analysis of λ 1, λ 2, λ 3, and λ 4 in total loss

Image visualization

Feature visualization

Error analysis

Performance across varying training data scales

Robustness to varying image degradation levels

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Human–computer collaborative approach to the decipherment of racle bone inscriptions with generative adversarial networks

Prism-OBI: a novel framework for oracle bone inscription recognition via visual perception and feature decoupling

Adaptive multi-feature fusion for visible-infrared image registration and character enhancement of bamboo slips

Visualizing poetry with deep semantic understanding and consistency evaluation

Search

Quick links

Sensitivity analysis of N(X ^s), ΔP _i, and other parameters in ADM

Sensitivity analysis of λ ₁, λ ₂, λ ₃, and λ ₄ in total loss