Abstract
In natural language processing, document-level relation extraction is a complex task that aims to predict the relationships among entities by capturing contextual interactions from an unstructured document. Existing graph- and transformer-based models capture long-range relational facts across sentences. However, they still cannot fully exploit the semantic information from multiple interactive sentences, resulting in the exclusion of influential sentences for related entities. To address this problem, a novel Semantic-guided Attention and Adaptively Gated (SAAG) model is developed for document-level relation extraction. First, a semantic-guided attention module is designed to guide sentence representation by assigning different attention scores to different words. The multihead attention mechanism is then used to capture the attention of different subspaces further to generate a document context representation. Finally, the SAAG model exploits the semantic information by leveraging a gating mechanism that can dynamically distinguish between local and global contexts. The experimental results demonstrate that the SAAG model outperforms previous models on two public datasets.
Similar content being viewed by others
Introduction
The purpose of document-level relation extraction (DocRE) is to extract all possible semantic relations among multiple entities from a given original text. It is widely used in downstream applications, such as knowledge graph construction1, recommendation systems2, and question answering3. Early research mainly focused on sentence-level relation extraction. However, most relations can only be expressed through multiple sentences and cannot be extracted by sentence-level relation extraction. Therefore, researchers have recently extended their efforts to DocRE.
Many researchers have adopted graph-based neural networks4, heuristic structures5 , or attention mechanisms6 for DocRE. Graph-based neural-network models can construct a document to connect different nodes and aggregate their information, and pretrained models can encode the input document into contextual representations and obtain state-of-the-art performance. However, previous models have focused on integrating sentence or document contextual information to enhance different granularity information representations while ignoring the effect of different words on contextual semantics in sentences. As Fig. 1 shows an example from document-level relation extraction dataset (DocRED)7, we need to predict the inter-sentence relation of “country” between the entity “Monticello High School” (red font) and the entity “U.S.” (blue font). Accurate entity representations facilitate relational judgments. For the subject entity “Monticello High School” in sentence 4, we observe that different words contribute unequally to the subject entity. Specifically, “Illinois” contributes more to the subject entity than other words. Likewise, the word “Galesburg” contributes more than the word “1886”. Hence, identifying critical words that contribute to semantic information is essential for solving relation extraction. In this study, to address the issue of semantic information ambiguity, semantic-guided attention techniques are employed that not only focus on the semantic information of the sentences containing the entities but also encompass the semantic information of the entire document. Furthermore, to distinguish more accurately the semantic information required for different entity representations, an adaptive gating mechanism is introduced.
Combining the semantic-guided attention technique and adaptive gating technique produced the following contributions:
-
Semantic-guided attention is developed, which can capture critical words that determine crucial semantic information in sentences.
-
A novel mechanism is designed for information aggregation. It is an adaptive gating mechanism for dynamically fusing semantic sentence features and document topic information.
-
Experiments demonstrate that the SAAG model is more competitive than existing models on two public DocRE databases. We also performed thorough analyses to show that our model is well interpretable.
Related work
In recent years, most models have focused on document graph construction and path-reasoning mechanisms in DocRE. Zeng et al.8 introduced double graphs with different granularities to solve the problems of complex interactions and path reasoning. Huang et al.9 constructed a relational knowledge graph to infer the unknown relations to learn the known knowledge. Zeng et al.10 designed a new mention-level graph by adding a document node and mention node and introducing a logical reasoning module to address logical reasoning problems more pertinently. Zhou et al.11 proposed adaptive thresholding and localized context pooling techniques to address multiple entity pairs and relation types, respectively. Xu et al.12 proposed a structured self-attention network to achieve structural and contextual reasoning simultaneously under the guidance of a model. Xie et al.13 proposed an evidence-enhanced framework to make the framework focus more on the import sentences by extracting and fusing evidence. Yu et al.14 introduced the RSMAN model to obtain flexible entity representations by incorporating attentive features. Yuan et al.15 proposed an optimization method based on knowledge relation reasoning that performs global reasoning on relational prompts through constraints from semantic segmentation. Han et al.16 proposed two co-occurrence prediction subtasks from coarse-grained and fine-grained perspectives to capture relational relevance. Then, the learned relation-aware embeddings are used to guide the extraction of related facts. Moreover, previous studies have employed attention mechanisms to model the semantic information of mentions or key sentences17,18; however, none of these methods considers that each contribution of the words that constitute a sentence is different. If the same attention score is used indiscriminately, it affects the accuracy of sentence representation and the performance of DocRE.
Unlike previous research, in this study, an attempt is made to address this problem by developing a novel Semantic-Guided Attention and Adaptive Gated(SAAG) model to enhance DocRE performance. As shown in Fig. 2, SAAG consists of two fundamental techniques to achieve semantic enhancement and adaptive entity representation: (1) a semantic-guided attention technique for incorporating word features to generate more-focused sentence representation and (2) an adaptive gating technique for better distinguishing between intersentence relation extraction and intrasentence relation extraction.
The overall architecture of Semantic-guided Attention and Adaptive Gated (SAAG) model. First, the document is input into the encoder. Then, the semantic-guided attention technique is used to obtain the sentence semantic feature representation and the document context representation. Next, the adaptive gating technique is employed to derive the semantic feature information representation. Finally, this representation is combined with the semantic feature representation and the relative distance representation, and the relation probability of the entity is obtained through a classifier.
Model
This section will provide the overall architecture of the Semantic-guided Attention and Adaptive Gated (SAAG) model. As shown in Fig. 2, in the following sections, we propose the encoding module in “Encoding module”, then detail the introduced SAAG model in “Semantic-guided attention and adaptive gatingmodule”, and finally, introduce the classification module in “Classification module”.
Encoding module
A given document \(D=\{w_1,w_2,w_3,\ldots ,w_n\}\) can be segmented into n words, where \(w_i\) denotes the i-th word in the document, and a context encoder is used to convert the document into a sequence of contextualized representations. Because entity types can enrich word representations and further narrow down the predicted entity relation, entity type embeddings \({E_t}({t_i})\) are concatenated with word embeddings \({E_w}({w_i})\). As for the entity type, following Yao et al.7, we introduce six entity types that cover a variety of entity types, including PER (Person), ORG (Organization), LOC (Location), NUM (Number), TIME (Time) and MISC (Other types). We get the following meaningful semantic information embedding representation by concatenation two embeddings:
Then, we feed the result of Eq. (1) into the encoder to obtain hidden state vector sequences for each word:
where encoder is BiLSTM or BERT.
Finally, it is necessary to obtain entity representations. Because an entity is expressed by one or many mentions, it is necessary to aggregate the mention information to calculate the entity representation to which it belongs. Furthermore, each mention comprises one or more words; therefore, one first must calculate the mention representation. Specifically, for entity \(e_i\), assuming that the set of mentions is \(E({e_i})\) by this entity, any mention \(m_j\) in the mention set contains the start and end positions from the s-th to the t-th word. Because logsumexp pooling can aggregate semantic feature information well, it is used to represent the mention \(m_j\):
For the entity \(e_i\) representation, we use the similar calculation method:
Semantic-guided attention and adaptive gating module
An SAAG mechanism is designed to aggregate sentence semantic information and dynamically manage intra-sentence relations or inter-sentence relations. In the following, the two technologies are introduced in detail.
Semantic-guided attention
For a target entity, the contribution of each word in a sentence to its prediction relation is different, especially for the prediction of intrasentence entity relations. Therefore, using the attention mechanism19 to guide sentence semantics by assigning different attention scores to different words is considered. Previous methods20,21,22 only considered words when calculating references and entities without considering the impact of each word on the target entity or only considered the impact of the sentence where the entity is located on the entity. Therefore, this method only roughly calculates the semantic feature information of a sentence; however, each word in the sentence has an impact on the entity.
More-important words are assigned higher attention scores to determine the effect of each word on the target entity in the sentence. Multilayer perceptron attention23 is used to guide the semantic sentence representation. Specifically, first, to better tie the words and entities in the sentence together, it is necessary to concatenate entity \(e_i\) and word \(w_{k}^{i}\) and then apply the softmax function to obtain the attention weight of each word in the sentence on the entity. Finally, the weighted sum of all words in the \(S_m\) sentence for entity \(e_i\) is obtained. This is similar to the case of \(S_{n}^j\).
The more important words are assigned more attention scores to get the effect of each word on the target entity in the sentence. We apply MLP attention23 to guide the semantic sentence representation. Specifically, first, in order to better tie the words and entities in the sentence together, we need to concatenate the entity \(e_i\) and the word \(w_{k}^{i}\), then we apply the softmax function to get the attention weight of each word in the sentence on the entity. Finally, we can obtain the weighted sum of all words in the \(S_m\) sentence for entity \(e_i\). Similar for \(S_{n}^j\).
Here, assume that there are p words in the m-th sentence, we get the representation of the sentence \(S_m\) where the entity \(e_i\) is located:
where [ ; ] is concatenation operation. \(\sigma (\cdot )\) is sigmoid function. \(W_1\) and \(b_1\) are learnable parameter and bias parameter, \(\mu _k^{i}\) is the attention score of \(\hat{h_k^{i}}\).
By concatenating \(S_{m}^i\) and \(S_{n}^j\), we can get the sentence semantic representation of entity pair \((e_i,e_j)\):
In addition, to match textual contexts and filter irrelevant semantic information for target entity pairs \((e_i,e_j)\), the multihead attention mechanism is used to capture the attention of different subspaces to the target entity pairs further, and an entity-pair-aware attention strategy is introduced to generate document context representation \(c_{(i,j)}\):
where \(A_i\) and \(A_j\) respectively are the attention weight of all words on entity \(e_i\) and \(e_j\), D is the number of multi-head, H is the entire document embedding representation, \(\odot\) denotes matrix multiplication.
Adaptive gating
Based on the work of Yuan et al.22, an adaptive gated mechanism is designed to fuse sentence and context feature representations dynamically. Therefore, the proposed model can choose semantic sentence information to solve the intrasentence relation extraction task and enrich document context information to solve the intersentence relation extraction task. This technique can better capture local–global contextual information and promote the performance of DocRE.
Gating mechanism is used to adjust adaptively the representation of sentence sematic and document context representation as Eqs. (12) and (13) for targer entity pairs \((e_i,e_j)\):
where \(\sigma (\cdot )\) denotes sigmoid function, \(W_g\) and \(b_g\) are learnable parameter and bias parameter, \(\odot\) is element-wise multiplication, \(G_{(i,j)}\) is semantic feature information of entity pair \((e_i,e_j)\).
Classification module
In the DocRE task, the relation extraction problem is a multilabel classification task. Specifically, for the entity pair \((e_i,e_j)\), the entity representation is concatenated with the feature information representations and relative distance representation to form the final representation of the target entity:
where \(D_{(i,j)}\) is the relative distance representation from two entities of the first mentions in the document.
At last, we use a sigmoid function to calculate the probability of each relation r:
where \(W_r\), \(b_r\) is learnable parameter and bias vector.
We deploy binary cross-entropy loss to train our model, and we apply Adam optimizer24 to optimize it during training. The loss function is reformulated as follows:
where R is the number of pre-defined relation type, \(p_r \in \{0, 1\}\) is the ground truth value.
Experiment
Dataset
The proposed SAAG method is evaluated on two public DocRE datasets: (1) DocRED: The DocRE dataset created by Yao et al.7 is a large-scale dataset for DocRE that includes 3053/1000/1000 documents for training, development, and testing, respectively, with a total of 96 relation types. (2) CDR: The chemical disease reactions dataset introduced by Li et al.25 is a widely used DocRE in the biomedical domain, which contains 1500 documents split into three equal-sized documents for training, development, and testing. Dissimilar to the DocRED, the CDR focuses on only one relation, “chemical-disease” with manually annotated.
Compared models
Our SAAG model compares with the following classic and competitive methods on two public DocRE datasets, including three types: Sequence-Based Models. These models leveraged various neural architectures to encode the given document into a hidden state sequence, including convolution neural networks (CNN)7, long short-term memory (LSTM)7 and bidirectional long short-term memory (BiLSTM)7. (2) Graph-based Models. The model converts the given document into a graph structure to explicitly tackle long-distance dependencies, including EOG4, GAIN8, HeterGSAN26. (3) BERT-based Models. These picked up pre-trained models such as BERT27 to implicitly capture long-distance dependencies to improve predicting task , including ATLOP\(_{bert}\)11, SSAN\(_{bert}\)12, DISCO\(_{bert}\)21, MRN\(_{bert}\)28, EIDER\(_{bert}\)13, AFLKD\(_{bert}\)29, GRACR\(_{bert}\)17, CPT-R\(_{bert}\)15, DREEAM\(_{bert}\)18, DRETC\(_{bert}\)30, Correl\(_{bert}\)16, DRE-EA\(_{bert}\)31.
Implementation details
The proposed SAAG model is evaluated based on Hugging Face Transformers32 under PyTorch33, as word embedding and encoder, following most of the experiment settings5,7. SAAG\(_{glove}\) applies 100-dimension Glove34 and 128-dimension BiLSTM35 as word embedding and encoder. SAAG\(_{bert}\) uses Uncased BERT-Base as an encoder with a learning rate of 1e−5, and the word embeddings of Uncased BERT-Base are mapped to 128-dimension space by a linear layer. For CDR, we apply SciBERT36 as the encoder. Early stopping is considered based on the dev set’s maximum F1 value, corresponding epoch, and threshold \(\theta\). We adopt Adam24 as the model optimiser for 200 epochs with a weight decay of 0.0001. All hyper-parameters are tuned, referring to the printed results on the dev set.
In addition to the widely used F1 values for DocRED, another evaluation indicator, Ign F1, is introduced. Yao et al.7 proposed Ign F1 for calculation with triples sharing in the training set removed. This approach is helpful for performance cheating leaks from the training set.
Experiments and results
DocRED results
Comparative experiments are conducted with three different model types using DocRED. Detailed experimental results are presented in Table 1. Among the models that do not use BERT as an encoder, SAAG with GloVe obtained 58.63 F1 in the test and outperformed the best model, MRN, by 0.17 points. This validates the necessity of incorporating sentence semantic features and document topic information into DocRED.
The SAAG model, which uses BERT as an encoder, surpassed nearly all the models in terms of the F1 value and is only slightly behind the most competitive models. This is primarily because of the demonstrated strength of the DRE-EA model in evidence reasoning. However, the proposed model does not incorporate reasoning techniques, which results in a performance disadvantage compared with the DRE-EA model in relation extraction tasks that require reasoning. Additionally, among the compared models, the DREEAM model uses an attention mechanism that focuses on evidence sentences. Similarly, the EIDER model extracts and enhances evidence sentences to find supporting sentences for entity relations; however, it does not achieve better performance in DocRE. This is primarily because the attention mechanism used in this model focuses on evidence sentences, whereas the evidence for extracting intersentential entity relationships often comes from the entire document. Some models that use reasoning approaches for relation extraction, such as the GRACR and CPT-RI models, rely on entity-level reasoning. This results in larger modeling granularity, which limits their ability to capture finer-grained semantics. In contrast, the SAAG model focuses on the influence of words on entities, leading to a better performance in DocRE. Consequently, the performance of these models in this area is relatively low.
CDR results
Table 2 shows the detailed results of the test set for the CDR dataset. The proposed SAAG model obtains the best results. SAAG\(_{SciBert}\) exceeds the best EIDER model by 0.29 F1. This suggests that SAAG has more substantial prediction power in DocRE tasks. This results from the two techniques of semantic-guided attention and adaptive gating. Combining them can further enhance the performance in relation extraction, even in many biomedical texts.
Ablation study
To analyze the effect of the proposed SAAG model, an additional ablation study is conducted on the DocRED dev set when the dependency of each module is excluded in SAAG. Table 3 lists the results. that the following can be observed: (1) Sentence semantic representation, document context representation, and adaptive gating mechanisms are essential components of the SAAG model. When each of these modules was removed from the SAAG model, the F1 values decreased by 2.53, 1.15, and 1.07, respectively, indicating that they played different roles in the SAAG model. (2) F1 dropped sharply by 2.53 on the dev set when sentence semantic representations are removed. This result is consistent with the fact that nearly 60% of the relations must be extracted from intrasentence relations in DocRED. This shows that the proposed sentence semantic representation module can address intrasentence relations more effectively. (3) F1 decreased by 1.15 on the dev set when document context representations are removed, which shows that document context representation plays a crucial role in capturing document-aware features. (4) F1 decreased by 1.07 on the dev set when the adaptive gated mechanism is removed, from which one can conclude that the adaptive gated mechanism provides an effective technique to adjust sentence semantic information and document context information dynamically and address the intrasentence and intersentence relation problem in DocRE in a more targeted manner.
Intra- and inter-sentence relation extraction
The results of the ablation experiments demonstrate that that semantic and document context information has been specially designed to solve the problem of intra-sentence and inter-sentence relation extraction through an adaptive gate mechanism. Next, comprehensive experiments are conducted to verify further the performance of the model in intrasentence and intersentence relation extraction on the DocRED dev set.
The problem of distinguishing whether an entity pair has an intra-sentence or inter-sentence relationship must be addressed. When the relations of an entity pair can be predicted from a single sentence, the entity pair expresses intrasentence relations. Otherwise, it expresses intersentence relations. Unlike intrasentence relations, intersentence relations usually require complex interactions across multiple sentences throughout a document.
Figure 3 shows the intra-F1 and inter-F1 results for the dev set. One can observe that SAAG performs better than the previous models. For example, compared with the MRN model, the SAAG model improves the Intra-F1 value by 2.50 and the Inter-F1 value by 0.98. In addition, experiments on intrasentence and intersentence relation extraction show that, when the model enhances the performances of both intrasentence and intersentence relation extraction, the performance of DocRE improves. Thus, these experimental results validate the effectiveness of the two key components, semantic-guided attention and adaptive gating, in addressing intrasentence and intersentence relations in the proposed model.
Case study
Figure 4 shows the relation extraction performance of the SSAN\(_{bert}\)12, EIDER\(_{bert}\)32, and SAAG models for instances from DocRED. The instance contains six sentences, in which both SSAN and EIDER can recognize the intrasentence relations of (Life in Color, United States) and (EDM, United States), as well as the intersentence relation of (Life in Color, SFX Entertainment). However, the SSAN and EIDER did not extract the relation between the entities “Life in Color Festival” and “United States”. If models can utilize the information of the entity “Life in Color” in the triplet (Life in Color, United States, P127), they can understand better the semantic information of the entity “Life in Color Festival”. EIDER focuses too much on the evidence reasoning chain; however, in this case, there is no evidence chain that can be reached (Life in Color Festival, Life in Color), only scattered and sparsely related reasoning information. Therefore, the advantages of EIDER cannot be utilized effectively. However, the SAAG model proposed focuses more on entity representation. The model not only constructs the connection between “Life in Color Festival” and “Life in Color” through the attention mechanism but also gives a more detailed weight division for words in the sentence. Therefore, the SAAG model provides a more effective solution for extracting the relation between the entity pair (Life in Color Festival, United States).
However, it is evident that the proposed SAAG model did not correctly extract the relation between the entities “Dayglow” and “United States”, nor that between “Miami” and “United States”. In contrast, the EIDER model successfully captured the relation between “Dayglow” and “United States” by leveraging the advantages of evidence chains. It is necessary to investigate carefully the reasons for this. First, regarding the relation between “Dayglow” and “United States”, the SAAG model effectively recognizes the relation between “Life in Color” and “United States”, as well as “Life in Color” and “Dayglow”. However, it struggles with the relation mediated by the bridging entity “Life in Color”. Because “Life in Color” is mentioned most frequently and forms the longest relation chain, the semantic-guided attention technique in SAAG assigns more weight to “Life in Color”, resulting in less attention being allocated to the “Dayglow” entity within the same sentence. This leads to a lack of semantic connection when attempting to identify the relation between “Dayglow” and “United States”. Second, concerning the relation between “Miami” and “United States”, the SAAG model has certain shortcomings in common-sense reasoning methods. Furthermore, it allocated less attention to “Miami” in the same sentence as “Life in Color”, which resulted in failure to extract the relation between these entities accurately.
Conclusion
The SAAG model is developed specifically to address the issue of semantic ambiguity between entities in DocRE. The model utilizes semantic guidance techniques to model different representations of intrasentence and intersentence entities. In addition, it employs an adaptive gating mechanism to distinguish between intrasentence and intersentence entity pairs dynamically. However, this study still has somewhat coarse granularity in defining the relationships between entities.
Planned future work will focus on different reasoning paths37,38 between entities, further refining the distinctions between intrasentence and intersentence entities to establish more-detailed entity pair relations, thereby enhancing the performance of DocRE.
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Speer, R., Chin, J., Havasi, C. Conceptnet 5.5: An open multilingual graph of general knowledge. in Proceedings of the 31st AAAI Conference on Artificial Intelligence, 4444–4451 (2017) .
Zhang, S., Yao, D., Zhao, Z., Chua, T.S., Wu, F. Causerec: Counterfactual user sequence synthesis for sequential recommendation. in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 367–377 (2021) .
Yang, W., Xie, Y., Lin, A., Li, X., Tan, L., Xiong, K., Lin, J. End-to-end open-domain question answering with bertserini. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 72–77(2019).
Christopoulou, F., Miwa, M., Ananiadou, S. Connecting the dots: Document-level neural relation extraction with edge-oriented graphs. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4925–4936 (2019).
Nan, G., Guo, Z., Sekuli, I., Lu, W. Reasoning with latent structure refinement for document-level relation extraction. in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 1546–1557 (2020).
Wang, D., Hu, W., Cao, E., Sun, W. Global-to-local neural networks for document-level relation extraction. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3711–3721 (2020).
Yao, Y., Ye, D., Li, P., Han, X., Lin, Y., Liu, Z., Sun, M. DocRED: A large-scale document-level relation extraction dataset. in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics(ACL), 764–777 (2019).
Zeng, S., Xu, R., Chang, B., Li, L. Double graph based reasoning for document-level relation extraction. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1630–1640 (2020).
Huang, H., Lei, M. & Feng, C. Graph-based reasoning model for multiple relation extraction. Neurocomputing. 420, 162–170 (2021).
Zeng, S., Wu, Y., Chang, B. Sire: Separate intra-and inter-sentential reasoning for document-level relation extraction. in Findings of the Association for Computational Linguistics (ACL-IJCNLP), 524–534(2021).
Zhou, W., Huang, K., Ma, T., Huang, J. Document-level relation extraction with adaptive thresholding and localized context pooling. in Proceedings of the AAAI Conference on Artificial Intelligence, 14612–14620 (2021).
Xu, B., Wang, Q., Lyu, Y., Zhu, Y., Mao, Z. Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction. in Proceedings of the AAAI Conference on Artificial Intelligence, 14149–14157 (2021).
Xie, Y., Shen, J., Li, S., Mao, Y., Han, J. Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion. in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 257–268 (2022).
Yu, J., Yang, D., Tian, S. Relation-Specific Attentions over Entity Mentions for Enhanced Document-Level Relation Extraction. in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 1523–1529 (2022).
Yuan, C., Cao, Y., Huang, H. Collective prompt tuning with relation inference for document-level relation extraction. Inform. Process. Manag. 5103451 (2023).
Han, R. et al. Document-level relation extraction with relation correlations. Neural Networks 171, 14–24 (2024).
Liu, H., Kang, Z., Zhang, L., Tian, l., Hua, F. Document-level relation extraction with cross-sentence reasoning graph. in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 316–328 (2023).
Ma, Y., Wang, A., Okazaki, N. DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction. in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 1971–1983 (2023).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Polosukhin, I. Attention is all you need. Neural Inform. 5998–6008 (2017).
Zhang, N., Chen, X., Xie, X., Deng, S., Tan, C., Chen, M., Chen, H. Document-level relation extraction as semantic segmentation. in Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), 3999–4006 (2021).
Wang, H. et al. Document-level relation extraction using evidence reasoning on RST-GRAPH. Knowledge-Based Syst. 228, 107274 (2021).
Yuan, C., Huang, H., Feng, C., Shi, G. & Wei, X. Document-level relation extraction with entity-selection attention. Inform. Sci. 568, 163–174 (2021).
Dixit, K., Al-Onaizan, Y. Span-level model for relation extraction. in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 5308–5314 (2019).
Kingma, D.P., Ba, J. Adam: A method for stochastic optimization. in International Conference on Learning Representations (ILCR). arXiv preprint (2015). arXiv:1412.6980.
Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C.H., Leaman, R., Lu, Z. BioCreative V CDR task corpus: A resource for chemical disease relation extraction. Database. (2016) .
Xu, W., Chen, K., Zhao, T. Document-level relation extraction with reconstruction. in Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), 14167–14175 (2021).
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.. Bert: Pre-training of deep bidirectional transformers for language understanding. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 4171–4186 (2019).
Li, J., Xu, K., Li, F., Fei, H., Ren, Y., Ji, D. MRN: A locally and globally mention-based reasoning network for document-level relation extraction. in Findings of the Association for Computational Linguistics (ACL-IJCNLP), 1359–1370 (2021).
Tan, Q., He, R., Bing, L., Ng, H.T. Document-level relation extraction with adaptive focal loss and knowledge distillation. in Findings of the Association for Computational Linguistics (ACL), 1672–1681 (2022).
Zhang, Z., Zhao, S., Zhang, H., Wan, Q., Liu, J. Document-level relation extraction with three channels. Knowl.-Based Syst. 111281 (2024).
Xu, T. et al. Evidence Reasoning and Curriculum Learning for Document-Level Relation Extraction. IEEE Transactions on Knowledge and Data Engineering 36(2), 594–607(2024).
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., et al. Transformers: State-of-the-art Natural Language Processing. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 38–45 (2020).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lerer, A. Automatic differentiation in pytorch. in Proceedings of the 31st International Conference on Neural Information Processing Systems (2017).
Pennington, J., Socher, R., Manning, C.D.. Glove: Global vectors for word representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543 (2014).
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997).
Beltagy, I., Lo, K., Cohan, A. SciBERT: A pretrained language model for scientific text. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3615–3620 (2019).
Peng, X., Zhang, C., Xu, K. Document-level Relation Extraction via Subgraph Reasoning. in Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI), 4331–4337 (2022).
Huang, H., Lei, M. & Feng, C. Graph-based reasoning model for multiple relation extraction. Neurocomputing 420, 162–170 (2021).
Acknowledgements
This work is supported by the Henan Open University Doctoral Research Initiation Fund Project (Grant Number:BSJH-2024-01), the Key Scientific Research Project of Higher Education Institutions in Henan Province (Grant Number: 25A520058), and the Henan Open University Horizontal Research Project (Grant Number: HXKT-202205).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ding, X., Duan, S. & Zhang, Z. Semantic-guided attention and adaptive gating for document-level relation extraction. Sci Rep 14, 26628 (2024). https://doi.org/10.1038/s41598-024-78051-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-78051-9