Fig. 4: RoBERTa model.
From: Knowledge graph-based intelligent question answering system for ancient Chinese costume heritage

Note: This diagram presents the architecture of NLP system based on the RoBERTa model, illustrating the workflow from text input to task-specific processing and prediction. The input stage begins with tokenization, wherein the text is divided into tokens, and special markers (CLS and SEP) are introduced to delineate the structure of the text. These tokens are then transformed into numerical word vectors via embedding, resulting in input embeddings that represent the textual input in a format suitable for model processing. The core processing stage consists of stacked Transformer modules, which utilize multi-head attention mechanisms to capture complex interdependencies between words, while feed-forward networks refine the feature representations. The model is further optimized through residual connections and normalization (Add & Norm) to ensure stable and effective training. In the output stage, a task classifier utilizes the features extracted from the Transformer, particularly the CLS vector representing overall semantic information, to classify the input text into a specific task category. Subsequently, text prediction generates task-specific outputs, such as text generation or sentiment analysis results, based on the model’s processed representations.