Introduction

Graph-structured data, which has the flexible representation capabilities of nodes and edges, can effectively model complex systems and be widely used in fields such as literature classification and commodity recommendation1. Current mainstream analytical methods for graph-structured data can be broadly divided into two categories: graph fusion methods2 and GNN. Traditional graph fusion methods rely on manually designed features. In contrast, GNN achieves automated modeling of graph relationships through end-to-end deep learning frameworks. A core advantage of GNN is their ability to aggregate information from the direct neighbors of any node3. This process constructs representations that are rich in features and unique. These representations effectively capture the inherent complex dynamics and interdependencies within graph structures. Additionally, GNN has significant advantages in architectural design. They can simultaneously consider the features of individual nodes and the contextual information provided by their neighboring nodes4. This dual focus enables GNN to develop a comprehensive perspective of the graph. The generated representations reflect both the essential characteristics of local node features and the broader global structure of the network.

The effectiveness of GNN is contingent upon a key assumption: the comprehensiveness and accuracy of the node features and structural relationships within the dataset. However, in practical application scenarios, this presupposition is often found to be inadequate. For example, in e-commerce, consumer privacy preferences can obscure critical insights into purchasing behaviors; in citation networks, nonstandardized attributes of academic articles hinder effective attribute extraction; in environmental monitoring, sensor failures can lead to loss of critical data5,6. With the emergence of data incompleteness issues, there is a significant challenge to the performance of GNN.

To address this challenge, the field of GCL has sprung up. GCL has achieved significant advances in managing semi-supervised node classification tasks with missing attributes, focusing on reconstructing the absent node attributes based on the alignment between existing structural relationships and the missing attributes. The existing attribute completion methods are primarily divided into two categories: imputation-based attribute completion and model-learning-based attribute completion.

Imputation-based attribute completion methods use reasoning or estimation to fill in missing node attributes. They often use information already present in the graph structure. For example, the neighborhood averaging method7 calculates the average of neighbor attributes to complete the data. This process is simple but ignores structural differences. The k-nearest neighbors imputation method8 better uses local structural similarities in the graph. However, it is limited by the preset k value and cannot adapt well to graphs with different density distributions. These methods are efficient and easy to implement. However, their linear assumptions and shallow reasoning mechanisms struggle to capture complex nonlinear relationships in graph data.

Model-learning-based attribute completion methods have been developed to address these limitations. They use deep learning models to identify complex relationships between graph structures and attributes, enabling the completion of missing attributes. In recent research, the revisiting initializing then refining method9 has improved the accuracy and robustness of attribute completion through high-order structural matrix approximation and initialization optimization strategies. However, initialization still relies heavily on quality. The partial graph convolutional network method10 reduces the over-reliance on initialization by using partial aggregation functions and minimizing the model to handle incomplete graph data effectively. However, the framework’s persistent limitation lies in its underutilization of topological relationships inherent in graph-structured data.

The superiority of GCL methods essentially depends on the accuracy of structural relationships within the data. In practical applications, data incompleteness and potential errors in structural connectivity are common. The collected structural relationships often contain incorrect connection information. Such misleading structural relationships can harm the accuracy of completed node attributes and further degrade the performance of downstream semi-supervised node classification tasks. Recently, researchers have started to address this issue. They designed a multi-view graph imputation network11, which treats attributes and structures as two views. The contrastive alignment mechanism boosts cross-view consistency in the latent space, yielding enhanced generalizability. However, it decouples attributes from the graph structure, overlooking the issue that substantial attribute changes may lead to structural variations in the graph. Furthermore, compared to methods such as graph convolutional network (GCN)12 and other GNN, graph autoencoders are unsupervised learning models. They may not fully utilize existing labels, potentially failing to capture complex features and patterns in graph structures. This can negatively impact downstream task performance.

To address the above challenges, we propose the enhanced graph structure (EGS) method. It targets semi-supervised node classification tasks with missing attributes. The main idea is to dynamically reconstruct node features and update graph structures during the attribute completion process. The core idea is to combine prior knowledge with data-driven learning through multi-regularization joint optimization. This improves the robustness and interpretability of graph completion tasks. Inspired by feature propagation (FP)13, we note the advantages of smoothness constraints from Dirichlet energy. However, we also observe that the Dirichlet energy may overly rely on the accuracy of the initial graph structure and potentially ignore the fidelity of attribute data. To address these dual limitations, we develop a unified constraint mechanism that simultaneously optimizes node structural relationship preservation and attribute reconstruction fidelity. This forms a new objective function. In the optimization process of the objective function, traditional methods fix the Laplacian matrix and only optimize the feature matrix. This prevents dynamic correction of the graph structure based on data, limiting the model’s ability to express complex relationships. To address this, we introduce an alternate optimization algorithm. During the correction process, it can dynamically and alternately correct attributes and graph structures. This reconstructs more accurate graph data. The reconstructed node attributes can be applied to downstream semi-supervised classification tasks, significantly improving the performance of GNN in incomplete data scenarios.

The main contributions of this work are as follows:

  • We propose a novel evolving graph structure framework for semi-supervised node classification tasks with missing attributes. It highlights the importance of graph structure updates in attribute completion;

  • We combine Dirichlet energy with dual constraints for graph structure updates and attribute reconstruction. This provides an effective objective function. During optimization, we introduce a joint linear optimization scheme. It dynamically reconstructs features and updates graph structures through alternating optimization, better completing the graph data;

  • Extensive experiments on five widely used benchmarks, five different attribute missing rates and seven GNN variants demonstrate the scalability, and superiority of the proposed method in comparison to the existing GCL methods.

Related work

Graph neural networks

GNN is a type of deep learning model designed for graph-structured data. It generates node or graph embeddings by iteratively aggregating information from neighboring nodes. Based on the information aggregation mechanism, GNN are mainly divided into two categories: frequency-domain GNN and spatial-domain GNN.

Frequency-domain GNN defines graph convolution from a signal-processing perspective. It maps graph data to the frequency domain using the Fourier transform, performs filtering operations, and then transforms it back to the spatial domain for feature extraction. Traditional frequency-domain GNN14,15,16 improve performance through convolution and kernel changes, but they require high computational complexity. Recently, the adaptive spectral wavelet transform-based self-supervised GNN17 was proposed. It embeds nodes using similar network neighborhoods and features, reducing complexity. However, these methods rely on graph Laplacian matrix decomposition or polynomial approximation, making it difficult to scale to large-scale dynamic graphs.

Spatial-domain GNN directly defines convolution operations in the node domain. It generates the next layer’s node features by aggregating neighbor node features, avoiding frequency-domain transformation. This makes it more suitable for large-scale graph data. GCN is a classic method of spatial-domain GNN. It obtains graph structure information by performing convolution operations on the adjacency matrix, laying the foundation for subsequent research. Graph sample and aggregate (GraphSAGE)18 and graph isomorphism network19 enhance aggregation, improving performance. However, these methods have high complexity. Recently, a multimodal adaptive spatio-temporal GNN model20 multi-dimensional and combines spatio-temporal features in complex spatial data. It improves performance while effectively reducing computational complexity. However, the completeness and accuracy of node features and structural relationships remain key factors affecting GNN performance. In practical applications, missing data is common for various reasons. When data is missing, the performance of GNN can fluctuate significantly.

Graph completion learning

In the field of graph data mining, one of the core challenges in GCL is the issue of missing attributes. Traditional solutions are mainly divided into two categories: interpolation-based completion and model-learning-based completion.

Interpolation-based completion uses a graph Laplacian matrix-guided k-nearest neighbors imputation method or graph Fourier interpolation for function reconstruction. While existing approaches demonstrate computational efficiency and implementation simplicity, they fundamentally neglect the inherent structural semantics required for comprehensive graph representation learning. Model-learning-based completion employs deep models like graph autoencoder21 or GCN for feature reconstruction, effectively alleviating this issue. For example, learning on attribute-missing graphs22 combines Transformer and GNN, where the former reconstructs missing features and the latter handles downstream tasks. However, it has high computational complexity and is difficult to extend to heterogeneous graphs. To address this, researchers proposed a higher-order heterogeneous GNN method23. It uses a self-attention mechanism transformer to fill missing attributes and adopts an attribute enhancement strategy, enabling the model to fully learn the attributes of heterogeneous neighbors. However, these methods often handle missing attributes separately, neglecting the importance of graph structure learning.

Graph structure learning can effectively constrain feature propagation paths through precise topological connections, improving completion accuracy to some extent. Current graph structure learning methods fall into two typical categories: supervised graph structure learning and unsupervised graph structure learning. Supervised graph structure learning uses known node labels or other forms of supervision to guide the optimization of graph structure. For instance, probabilistic semi-supervised learning via sparse graph structure learning24 constructs a semi-supervised graph structure through sparse weighting, effectively integrating unlabeled data. However, node degree issues may affect representation fairness. To solve this, the structural rebalance GNN model25 combines graph structure balancing with adversarial learning to improve representation fairness. However, it relies on local labels, which can lead to data distribution skew and edge connection bias. Unsupervised methods alleviate reliance on local labels by using contrastive learning or random walks to mine potential connection patterns. Traditional unsupervised graph structure learning methods26,27 are relatively simple but underutilize node features. To address this, the self-constrained graph contrast enhancement network28 designs a multi-branch strategy to generate graph structures and combines a feature contrast enhancement module to extract discriminative features. However, this network ignores the fact that updated attribute information can, in turn, promote the optimization of the adjacency matrix. Theoretical analysis reveals that graph topology refinement and node feature recovery inherently exhibit mutually enhancing dynamics through co-evolutionary optimization mechanisms.

Recent research has begun exploring the combination of attribute completion and graph structure learning. For example, the attribute-structure decoupled variational autoencoder29 adopts a different strategy: it encodes attributes and structures separately, aiming to learn a shared latent space by maximizing the joint probability distribution between different view representations. However, graph autoencoders may not fully utilize existing labels, making it difficult to comprehensively capture complex features and patterns in graph structures. Additionally, most current methods tend to use static fusion mechanisms, failing to deeply consider other potential effects caused by attribute updates or graph structure adjustments.

Dirichlet energy

Research related to Dirichlet energy in the field of graph data mainly focuses on GNN optimization and graph structure representation learning. Its core idea is to regulate the smoothness and separability of node features through energy constraints, thereby enhancing model performance. In some graph-related tasks30,31, Dirichlet energy has been used as a regularization tool. This benefits from the smoothness criterion in Dirichlet energy: it measures the “roughness” or “degree of variation” of a function by integrating the square of its gradient. When the Dirichlet energy of a function \(\textrm{x}\) is small, it means the gradient of the function is relatively small across the entire region, indicating smooth changes and good smoothness. However, excessive smoothness may also lead to the homogenization of node features, losing their discriminative power. To prevent this over-smoothing phenomenon, energy GNN32 cleverly sets lower and upper bounds on the Dirichlet energy at each layer to avoid performance degradation due to excessive smoothness. Recently, the DESAlign33 method uses the generalizable theoretical principles inspired by Dirichlet energy to guide multi-modal knowledge graph learning, ensuring the optimization of semantic consistency. Although Dirichlet energy has shown great potential in these applications, it still has some issues. For example, it overly relies on the accuracy of the initial graph structure and tends to ignore the fidelity of observed data. If the initial graph structure itself has missing or erroneous information, relying solely on Dirichlet energy for optimization may lead to results that increasingly deviate from the true situation. Additionally, pursuing smoothness alone may overly suppress local differences in node features, causing the completion results to significantly deviate from the real observed data.

Methodology

In this section, we will detail the proposed EGS framework. Figure 1 visually shows the overall architecture of our method. For data with missing attributes, especially missing node features, the goal of this framework is to learn an optimal graph representation by dynamically updating the graph structure after an initial attribute reconstruction. This helps explore higher-order relationships in the data and addresses the missing attribute problem from two perspectives. Specifically, our method first constructs a graph to represent the correlations between data points. During this process, the framework flexibly supports any graph construction strategy. Based on the initial graph, to overcome the limitations of Dirichlet energy–its over-reliance on the accuracy of the initial graph structure and potential neglect of observed data fidelity–, we build an objective function that jointly optimizes attribute features and graph structure. In each iteration, starting from the attribute level, we first reconstruct the feature matrix. Then, based on the updated feature matrix, we further update the graph structure to obtain the updated normalized Laplacian matrix. This process repeats until the objective function value stabilizes and no longer decreases significantly. During this time, the graph structure is continuously optimized. Finally, through the multi-regularization joint optimization strategy, we combine prior knowledge with data-driven learning to reconstruct an optimal graph representation. This lays a solid foundation for subsequent classification tasks. The following sections will provide a more detailed analysis of this method.

Fig. 1
figure 1

The whole framework of the proposed EGS method.

EGS problem formulation

In this study, we consider an undirected graph defined by an adjacency matrix \(\textrm{A}\) and a node feature matrix \(\textrm{X}\). The normalized Laplacian matrix of the graph, \(\Delta =\textrm{I}-\widetilde{\textrm{A}}\), is a key component of a positive semi-definite matrix. It is made up of the normalized adjacency matrix \(\widetilde{\textrm{A}}\) and the degree matrix \(\textrm{D}\). Specifically, \(\widetilde{\textrm{A}}=\textrm{D}^{-\frac{1}{2}} \textrm{AD}^{-\frac{1}{2}}\) is obtained by multiplying the diagonal matrix \(\textrm{D}\) by \(\textrm{A}\). The degree matrix \(\textrm{D}=\operatorname {diag}\left( \sum _{\textrm{j}} \textrm{a}_{1 \textrm{j}}, \ldots , \sum _{\textrm{j}} \textrm{a}_{\textrm{nj}}\right)\) contains the degree information of each node.

In graph learning with missing features, the structure of graph \(\textrm{G}\) is typically assumed to be known, while labels and node features are only partially available on a subset of nodes. However, missing severe node features can alter the semantic information of the graph, leading to changes in the graph structure. For example, in molecular graphs, missing nodes can change molecular properties, resulting in entirely different molecules. Therefore, updating the graph structure plays a crucial role in addressing feature-missing problems. Specifically, our goal is to learn a function \(\ell (\textrm{X},\Delta ,\textrm{G})\) that infers missing feature values using the existing graph structure, dynamically updates the graph structure via the standard Laplacian matrix, and ultimately reconstructs the complete feature vector \(\textrm{X}\) from the known partial features \(\mathrm {X_{k}}\). To define this objective function, two fundamental conditions must be satisfied.

Smoothness Constraint: The imputed features should exhibit smoothness in both attributes and structure. Specifically, this requires the feature matrix to be smooth on the normalized Laplacian matrix. Dirichlet energy enforces the smoothness of the imputed feature matrix across the graph structure by penalizing differences between neighboring node features, aligning with the homophily assumption of graph data (neighboring nodes share similar features). Additionally, the Laplacian matrix directly encodes the graph’s topological structure, incorporating structural information into the optimization objective through a quadratic form, thereby preventing imputation results from contradicting the graph topology. To this end, this paper adopts Dirichlet energy as a quantitative measure of smoothness. The mathematical expression of the Dirichlet energy \(\textrm{f}\) is shown in Eq. (1):

$$\begin{aligned} \textrm{f}=\frac{1}{2} \textrm{X}^{\textrm{T}} \Delta \textrm{X}=\frac{1}{2} \sum _{\textrm{ij}} \tilde{\textrm{a}}_{\textrm{ij}}\left( \textrm{X}_{\textrm{i}}-\textrm{X}_{\textrm{j}}\right) ^{2} \end{aligned}$$
(1)

where \(\textrm{X}\) represents the feature matrix. \(\Delta\) is the normalized Laplacian matrix. \(\mathrm {X_{i}}\) and \(\tilde{\textrm{a}}_{\textrm{ij}}\) are individual elements in the feature matrix \(\textrm{X}\) and the normalized adjacency matrix \(\tilde{\textrm{A}}\).

Empirical Loss Minimization: The original Dirichlet energy may overlook the fidelity of observed data. Relying solely on smoothness could excessively suppress local differences in node features, causing imputation results to deviate from the true observed data \(\mathrm {X_{0}}\) (e.g., some nodes already have high-confidence labels). By using the Frobenius norm, the imputed feature matrix \(\textrm{X}\) is forced to remain as close as possible to the initial observed data \(\mathrm {X_{0}}\), preventing the loss of known information due to over-smoothing. Unlike node-wise L2 constraints, the Frobenius norm globally constrains the difference between \(\textrm{X}\) and \(\mathrm {X_{0}}\), making it more suitable for matrix-based feature imputation. Additionally, the original Dirichlet energy overly relies on the accuracy of the initial graph structure. When optimizing \(\Delta\), its deviation from the initial structure \(\Delta _{0}\) is constrained to prevent instability in structure learning caused by data noise while also reducing dependence on the initial graph structure during later optimization. Therefore, this paper incorporates the Frobenius norm into the objective function to quantify the difference between the transformed feature matrix and the initial feature matrix, as well as the deviation between the transformed normalized Laplacian matrix and the initial normalized Laplacian matrix, ensuring better smoothing of attributes and graph structure. The specific expressions are shown in Eqs. (2) and (3):

$$\begin{aligned} & \mathscr {R}_{\text{ emp1 }}=\left\| \textrm{X}-\textrm{X}_{0}\right\| _{\textrm{F}}^{2} \end{aligned}$$
(2)
$$\begin{aligned} & \mathscr {R}_{\textrm{emp2}}=\left\| \Delta -\Delta _{0}\right\| _{\textrm{F}}^{2} \end{aligned}$$
(3)

where the \(\mathrm {X_{0}}\) and \(\Delta _{0}\) are present in the initial feature matrix and the initial normalized Laplacian matrix, respectively. \(\mathscr {R}_{\text{ emp1 }}\) and \(\mathscr {R}_{\text{ emp2 }}\) denotes variance.

Combining the above two conditions, the objective function for multi-objective optimization is constructed using two Frobenius norm regularizations and Dirichlet energy. The convexity of the Dirichlet energy and the Frobenius norm ensures that the objective function converges to a local optimal solution during alternating optimization. This approach not only effectively addresses overfitting but also simplifies mathematical processing. The expression is shown in Eq. (4):

$$\begin{aligned} \ell (\textrm{X}, \Delta , \textrm{G}) = \frac{1}{2} \textrm{X}^{\textrm{T}} \Delta \textrm{X} + \lambda _{1}\left\| \textrm{X}-\textrm{X}_{0}\right\| _{\textrm{F}}^{2} \! + \lambda _{2}\left\| \Delta -\Delta _{0}\right\| _{\textrm{F}}^{2} \end{aligned}$$
(4)

where the \(\lambda _{1}\) and \(\lambda _{2}\) are the parameters used to balance the weights of the components in the objective function. By minimizing this function, we dynamically address both attribute reconstruction and graph structure updating, achieving imputed features that satisfy smoothness requirements while remaining consistent with the initial data.

Optimized solution

In this section, we detail the optimization of the objective function described in Eq. (4), which involves the joint learning of the feature matrix and the normalized Laplacian matrix. To achieve this, we employ the Alternating Direction Method of Multipliers, dynamically optimizing by fixing one variable and updating the other, iterating until the objective function converges or improvements become negligible. This indicates that the function’s gradient remains relatively small across the domain, and the function varies smoothly. Such enhanced smoothness facilitates gradual convergence toward the optimal solution.

Specifically, we first perform attribute reconstruction by optimizing the feature matrix \(\textrm{X}\) using the existing normalized Laplacian matrix \(\Delta\). After fixing \(\Delta\), the subproblem for optimizing \(\textrm{X}\) can be expressed as Eq. (5):

$$\begin{aligned} \arg \min _{\textrm{X}} \ell (\textrm{X}, \textrm{G})=\frac{1}{2} \textrm{X}^{\textrm{T}} \Delta \textrm{X}+\lambda _{1}\left\| \textrm{X}-\textrm{X}_{0}\right\| _{\textrm{F}}^{2} \end{aligned}$$
(5)

At this point, the function is convex. To solve this optimization problem, we compute the gradient of Eq. (5) concerning \(\textrm{X}\), yielding the result expressed in Eq. (6):

$$\begin{aligned} \nabla _{\textrm{X}} \ell =\Delta \textrm{X}+2 \lambda _{1}\left( \textrm{X}-\textrm{X}_{0}\right) \end{aligned}$$
(6)

Setting the gradient to zero to find the extremum point yields Eq.(7). Since \(\Delta\) is the normalized Laplacian matrix and is positive semi-definite, we prove that \((\frac{1}{2\lambda _{1} } \Delta + \textrm{I})\) is invertible (see Proof of Reversibility for details). The iterative equation for updating the feature matrix is shown in Eq. (7):

$$\begin{aligned} \textrm{X}=\left( \frac{1}{2 \lambda _{1}} \Delta +\textrm{I}\right) ^{-1} \textrm{X}_{0} \end{aligned}$$
(7)

Next, we update the graph structure by optimizing the normalized Laplacian matrix \(\Delta\) given the feature matrix \(\textrm{X}\). After fixing \(\textrm{X}\), the subproblem for optimizing \(\Delta\) is shown in Eq. (8):

$$\begin{aligned} \arg \min _{\Delta } \ell (\Delta , \textrm{G})=\frac{1}{2} \textrm{X}^{\textrm{T}} \Delta \textrm{X}+\lambda _{2}\left\| \Delta -\Delta _{0}\right\| _{\textrm{F}}^{2} \end{aligned}$$
(8)

Similarly, the function is convex at this point. Computing the gradient of Eq. (8) concerning \(\Delta\) yields the result expressed in Eq. (9):

$$\begin{aligned} \nabla _{\Delta } \ell =\frac{1}{2} \textrm{X}\textrm{X}^{\textrm{T}}+2 \lambda _{2}\left( \Delta -\Delta _{0}\right) \end{aligned}$$
(9)

By setting the derivative equal to zero to find the extremum point, it is straightforward to prove that \(\frac{1}{4\lambda _{2} }\textrm{I}\) is invertible. Accordingly, we derive the iterative equation shown in Eq. (10) to update the normalized Laplacian matrix:

$$\begin{aligned} \Delta =\Delta _{0}-\frac{1}{4 \lambda _{2}} \textrm{XX}^{\textrm{T}} \end{aligned}$$
(10)

The alternating optimization of the two forms a closed-loop feedback mechanism, effectively enhancing the rationality of the imputation results. In alternating optimization, each subproblem is convex, ensuring convergence to a local optimum in every update step. Through iterative refinement, the system progressively approaches the global optimal solution.

figure a

Outlines the implementation details of this alternating optimization process.

Computation complexity

In this section, we analyze the computational complexity of our proposed EGS method. The EGS method mainly consists of two alternating optimization steps: updating the feature matrix \(\textrm{X}\) and updating the normalized Laplacian matrix \(\Delta\).

  • Feature Matrix Optimization: The feature matrix optimization step involves solving a linear system of equations. The complexity of inverting a matrix is generally \(O(N^3)\);

  • Graph Structure Update: The update of the normalized Laplacian matrix \(\Delta\). The complexity of this step is dominated by the matrix multiplication, which is \(O(Nd^2)\).

Experiments

Experimental settings

Datasets

To demonstrate the proposed EGS’s effectiveness and scalability, we conduct extensive experiments on five datasets, including the citation networks dataset34 (Cora, Citeseer and Pubmed) and the Amazon dataset35 (Amazon-Photo and Amazon-Computers). To more clearly demonstrate the characteristics of these datasets, we denote the Amazon-Photo dataset as Photo and the Amazon-Computers dataset as Computers. We report the relevant statistics in detail in Table 1, including the number of nodes, the number of edges, the feature dimensions, and the number of categories.

Table 1 Details of the datasets.

Implementation details

This study adopts the following setup to ensure the reliability and reproducibility of the results. Five datasets mentioned in Section Datasets are used for random but fixed-ratio training/ validation/ testing splits, and different missing feature masks are applied for each experimental run (unless specified differently in specific codes or papers). Specifically, the experiments followed the strategy of a previous study36, allocating 20 nodes per class for the training set, 1500 nodes in total for the validation set, and the rest for the test set.

Regarding hyperparameter selection, two strategies have been employed to ensure a fair comparison with existing methods. Where hyperparameters are delineated in the original papers or codes, they have been directly used for the reproduction and subsequent comparison of results. For methods without provided parameters, we employ the same hyperparameter configurations within this paper to facilitate replication and comparative analysis.

During model training, we employed the Adam optimizer with a learning rate of 0.001. For the graph neural networks, two-layer and four-layer architectures were implemented, with a maximum training epoch of 500. To prevent overfitting, we incorporated early stopping (triggered after 10,000 consecutive steps without validation loss improvement) and a dropout rate of 0.9. Empirical results demonstrate full convergence across all experimental datasets within 35 iterations. To assess model performance, we identify three pivotal parameters: \(\lambda _{1}\), \(\lambda _{2}\) and the hidden dimension, which are subjected to a grid search. A detailed examination of the parameter selection process and its influence on model performance is reserved for the ensuing section on parameter analysis (Section Effects of Parameter). All experiments are conducted on the RTX 3080x2 (20GB) and the RTX 4090 (24GB), ensuring sufficient computational resources and accelerating execution speed.

Comparative experiment setting

To comprehensively evaluate the performance of EGS in addressing missing attributes for node classification tasks, we conduct comparative experiments against existing missing attribute handling methods. In this section, we present the names and configurations of the selected comparison methods. Detailed experimental results and conclusions will be provided in Section Experiment results.

The specific comparison methods include partial GNN (PaGNN)37, GCN for missing features (GCNMF)38, FP, pseudo-confidence-based feature imputation (PCFI)39, a dual-channel GNN (D2PT)40 and the topology-driven attribute recovery (TDAR)41. For GCNMF and PaGNN, due to the absence of an official implementation, our replication and comparative analysis are conducted based on the implementation specifics delineated in the FP paper and its corresponding code. For FP, PCFI and TDAR, the experiments are executed utilizing their publicly available code and hyperparameters, with only the missing rate being adjusted for replication purposes. Given that D2PT does not provide sufficient settings for the Photo and Computers datasets, we confine our comparative experiments using D2PT to the Cora, Citeseer and Pubmed datasets.

Moreover, we compare our approach with traditional simple interpolation methods42, such as setting missing features to zero or random values drawn from a standard Gaussian distribution. Additionally, we benchmark our method against other commonly used computational-based approaches, including variational auto-encoder43, missing data imputation using generative adversarial nets44, or SAT, noting that prior research has indicated their generally lower performance compared to GCNMF and PaGNN in handling graph data.

Experiment results

Table 2 presents a performance comparison between the EGS method and six other mainstream approaches, focusing on node classification accuracy and standard deviation across different missing rates for five datasets. The detailed analysis leads to the following conclusions:

Table 2 Thenode classification accuracy on Cora, Citeseer, Pubmed, Photo and Computers datasets. Bold data indicates suboptimal values of accuracy, and bold and enlarged font indicates optimal values of accuracy. (Evaluation criteria: Accuracy ± Standard deviation)
  • The EGS method significantly improves the performance of the downstream GCN model by filling in missing features during the preprocessing stage. In the vast majority of missing cases, EGS achieves higher classification accuracy and more stable standard deviation than other methods, demonstrating its effectiveness in handling missing attribute problems.

  • Experiment results on five benchmark datasets show that the EGS method consistently outperforms PAGNN, GCNMF, FP, PCFI and D2PT in performance. In terms of accuracy, a key metric for model evaluation, EGS demonstrates a clear advantage. Specifically, under five missing rates, the average accuracy of EGS exceeds that of PAGNN, GCNMF, FP, PCFI and D2PT by \(5.39\%\), \(10.15\%\), \(3.37\%\), \(3.88\%\) and \(8.15\%\), respectively. Notably, even compared to the recently popular PCFI and D2PT, EGS achieves significant gains, with maximum improvements of \(9.81\%\)\(9.83\%\), respectively. This indicates that although D2PT handles complex missing patterns and interactions well, and PCFI shows relative robustness under extremely high missing rates, both methods still lag behind EGS in accuracy when the missing rate ranges from \(5\%\)\(75\%\). In terms of robustness, measured by standard deviation, EGS also shows superiority. Its average stability improves by 0.64, 0.60, 0.29, 0.40 and 1.15 compared to PAGNN, GCNMF, FP, PCFI and D2PT, respectively. These results validate the effectiveness of the alternating optimization strategy used in EGS between the graph structure and the feature matrix. This strategy better captures the complex intrinsic relationships in data. Overall, the findings not only confirm the strong performance of EGS but also highlight the crucial role of graph structure optimization in enhancing feature learning.

  • In a systematic comparison with the state-of-the-art method TDAR, the EGS method further demonstrates its superior performance. Across five benchmark datasets and five missing rates, EGS achieves an average accuracy improvement of \(0.55\%\) over TDAR. Specifically, on the Citeseer dataset, EGS outperforms TDAR by an average of \(4.37\%\). On the Cora and Photo datasets, under low to moderate missing rates (\(5\%\)\(15\%\)), EGS consistently achieves higher node classification accuracy, with a maximum improvement of \(3.01\%\). Within the \(35\%\)\(55\%\) missing rate range, although the improvement margin of EGS is relatively limited, its performance still provides strong validation for the effectiveness of the method. EGS is only slightly outperformed by TDAR under the high \(75\%\) missing rate. However, such high missing rates are rare in real-world scenarios, making EGS’s overall accuracy advantage more meaningfully under typical conditions. On the Pubmed dataset, EGS surpasses TDAR by an average of \(2.55\%\) at low missing rates (\(5\%\), \(15\%\)). While EGS shows improvements, TDAR achieves better accuracy under high missing rates on the Pubmed dataset and across all missing rates on the Computers dataset. This can be attributed to dataset-specific characteristics and the design of TDAR. For example, Pubmed contains structural noise due to interdisciplinary citations, and Computers has large-scale and high-dimensional features. TDAR’s adoption of node homogeneity score and non-linkage similarity calibration helps address these challenges effectively. In terms of robustness, EGS outperforms TDAR on all five datasets, reducing the average standard deviation by 1.08, indicating better result stability. In summary, although TDAR demonstrates certain advantages in specific datasets and extreme missing rate conditions, EGS exhibits more robust generalization and stability across a wide range of benchmark datasets, typical missing rate intervals, and overall performance metrics.

Ablation studies

Effects of different regularization terms

To validate the effectiveness of EGS’s design principles, we conducted comprehensive ablation studies on the Cora and Pubmed datasets. The experiments were designed to isolate components of the objective function, focusing on the impact of regularization terms. When retaining only the attribute regularization term (EGS (w/o \(\lambda _{2}\))), the model inferred missing attributes using existing keywords but failed to effectively link thematically related paper nodes without citation relationships due to a lack of structural constraints. When retaining only the structural regularization term (EGS (w/o \(\lambda _{1}\))), the model enhanced latent thematic associations but risked over-connecting nodes sharing keywords without direct citations, introducing noisy edges. Removing all regularization terms (EGS (w/o all)) degraded collaborative attribute-structure optimization, leaving only the Dirichlet energy baseline. In contrast, the full EGS model with dual regularization terms balanced attribute completion and structural refinement through coordinated \(\lambda _{1}\) and \(\lambda _{2}\) optimization. For fairness, all experiments maintained parameter consistency with the original EGS configuration.

The ablation study results detailed in Table 3 demonstrate that EGS significantly outperforms methods with single regularization terms in node classification, achieving improvements of \(1.3\%\)\(4.09\%\). Compared to methods without regularization, the improvements further increase to \(3.54\%\)\(4.62\%\). The experiments indicate that incorporating both attribute and structural regularization into the Dirichlet energy framework prevents over-smoothing. This ensures the completion process is not overwhelmed by neighborhood features. Furthermore, the completed attributes help identify latent similar nodes, correcting structural sparsity through optimized connections. Notably, these improvements become more pronounced as the missing rate increases. These findings strongly support the critical role of dual regularization in enhancing EGS’s classification performance.

Table 3 Comparison results with different regularization terms. Data in bold indicates optimal values.

Effects of different optimization methods

To systematically evaluate the effects of optimization strategies on attribute-incomplete graphs, we performed comprehensive experiments on the Cora and Pubmed benchmarks, conducting empirical comparisons between joint regularization optimization and our proposed alternating optimization paradigm. The scenario where both regularization terms were optimized simultaneously was labeled as “EGS-si,” and the other, which involves the alternating optimization of the two regularization terms as mentioned in this paper, was labeled as “EGS” To ensure the fairness of the experiments, we used the EGS method to maintain consistency in all experimental parameters as much as possible. The results of the experiments are shown in Table 4. The experimental results indicate that the alternating optimization method outperforms the simultaneous optimization method on all datasets. Specifically, the alternating optimization method improved the accuracy in node classification tasks by \(0.9\%\)\(2.92\%\) and demonstrated better stability when dealing with high rates of missing data.

Table 4 Comparison results with different optimization methods. Data in bold indicates optimal values.

Effects of parameter

In our proposed EGS, three parameters are introduced: \(\lambda _{1}\), \(\lambda _{2}\), and the number of hidden layers H. The parameters \(\lambda _{1}\) and \(\lambda _{2}\) are employed to balance the dual constraints concerning nodes’ structure relationships updating, and attribute reconstruction, respectively. The parameter H regulates the number of hidden layers within the GCN. To assess the impact of different parameter selections on the node classification results, we conduct experiments on the Cora and Pubmed datasets in the data missing rate of \(5\%\).

The results as shown in Fig. 2. Specifically, Fig. 2a–c show the effects of different \(\lambda _{1}\), \(\lambda _{2}\) and H on the Cora dataset, respectively. Figure 2d–e show the effects on the Pubmed dataset. The node classification accuracy increased with \(\lambda _{1}\) and \(\lambda _{2}\) values up to 0.1, peaked at 0.1, and stabilized thereafter. Significant performance degration for EGS occurred only when \(\lambda _{1} > 100\), indicating strong robustness of the attribute reconstruction regularization term. For \(\lambda _{2}\), the model showed high parameter sensitivity at \(\lambda _{2} < 0.01\). Performance degradation emerged only when \(\lambda _{2} > 100\), suggesting that excessive structural constraints introduce bias. Notably, accuracy remained stable within \(\lambda _{2} \in [0.01, 100]\), demonstrating operational stability in this range. Based on these findings, we set \(\lambda _{1}\) = 0.1, \(\lambda _{2}\) = 0.1 and \(H = 1024\) as optimal configurations.

Fig. 2
figure 2

Comparison results with different hyperparameters on the Cora and Pubmed datasets.

Effects of different variants

To validate the applicability and generalization of the EGS method, we conduct a series of ablation experiments. These studies integrate traditional multilayer perceptron (MLP)45 with popular GNN variants like GCN, GraphSAGE, simplifying GCN (SGC)46 and graph attention networks (GAT)47, explore combinations with methods such as GCNMF and PAGNN.

We performed experiments on well-known datasets, Cora and Pubmed and evaluated these methods in the missing rate of \(5\%\). To ensure fairness, we use the published optimal parameters for the comparative FP method. The accuracies of different methods combined with various variants in node classification tasks are shown in Table 5. The table data shows that EGS outperforms FP in most variants, providing evidence for the versatility and extensiveness of the EGS approach. Notably, while both EGS and FP are compatible with various GNN architectures and EGS shows superior overall performance, its performance on MLP is significantly weaker than other models. This can be attributed to two factors. First, MLP relies solely on node features and completely ignores graph structural information. Consequently, it cannot leverage the structure-feature co-representation optimized by EGS through Dirichlet energy. Second, EGS generates highly smoothed, low-variance features by enforcing feature homogeneity among neighboring nodes via Dirichlet energy. However, MLP depends on nonlinear discriminative feature patterns for classification. This smoothing effect diminishes the separability of features, directly weakening MLP’s discriminative capability.

Table 5 Comparison of different variants in the missing rate of 5\(\%\). Data in bold indicates optimal values.

Ablation studies under high missing rate scenarios

To verify the robustness boundaries of EGS in extreme data missing scenarios, we conducted experiments with \(85\%\)\(95\%\) missing rates on the Citeseer dataset. As shown in Table 6, at \(85\%\) missing rate, EGS significantly outperformed traditional graph learning methods (e.g., GCNMF, PAGNN) and emerging attribute-missing models (e.g., D2PT, TDAR), achieving up to \(15.66\%\) accuracy improvement over baselines. EGS also achieved comparable performance to PFCI and FP, which excel in high missing rate scenarios. However, at a \(95\%\) missing rate, EGS exhibited a noticeable accuracy decline and underperformed PFCI and FP.

This phenomenon is directly linked to EGS’s algorithmic design. By introducing Dirichlet energy constraints and dual regularization terms, EGS explicitly models implicit structure-feature relationships. This mechanism effectively captures the intrinsic data manifold at regular missing rates (\(\le 75\%\)), where its standard deviations are significantly lower than competitors’, demonstrating stability advantages. When missing rates exceed \(85\%\), severe disconnections occur between topological information and attribute distributions, invalidating the local smoothness assumption in Dirichlet energy constraints. This weakens the regularization terms’ ability to compensate for missing patterns. Notably, extreme missing rates (\(>85\%\)) occur far less frequently in real-world scenarios. EGS has already demonstrated generalizability advantages within regular missing rate ranges (see Table 2). Future work will focus on adaptive energy constraint functions to enhance robustness against topological disconnections in extreme missing scenarios.

Table 6 The performance of node classification under \(85\%\) and \(95\%\) missing rate. Bold data indicates suboptimal values of accuracy, and bold and enlarged font indicates optimal values of accuracy. (Evaluation criteria: Accuracy ± Standard deviation)

Conclusion

This paper proposes a general solution strategy, EGS, for handling missing attributes in graph data. The strategy adopts an alternating optimization framework to progressively impute missing features by iteratively minimizing an objective function. The core innovation lies in the dynamic collaboration between feature completion and graph structure optimization. This mechanism significantly improves the accuracy of node representation and attribute imputation. Extensive experiments on five different datasets demonstrate that the proposed method performs well in node classification tasks. These results confirm the effectiveness of our method in addressing missing attributes in graph data and highlight its potential value in broader graph analysis applications.

The current EGS framework is designed for homogeneous graph structures. Its alternating optimization process relies on matrix inversion, which leads to high computational complexity and poses a bottleneck for large-scale graph data. Experiment results also show that the robustness of the framework under high missing rates needs further improvement. Future work will focus on lightweight optimization methods to enable robust representation learning on heterogeneous graphs with high missing rates. We also aim to explore the practical implementation of this framework to support real-world deployment.