Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models

Acharya, Vasundhara; Yener, Bülent; Beamer, Gillian

doi:10.1038/s41598-025-13697-7

Download PDF

Article
Open access
Published: 10 August 2025

Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models

Vasundhara Acharya¹,
Bülent Yener² &
Gillian Beamer³

Scientific Reports volume 15, Article number: 29274 (2025) Cite this article

3879 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The development and refinement of artificial intelligence (AI) and machine learning algorithms have been an area of intense research in radiology and pathology, particularly for automated or computer-aided diagnosis. Whole Slide Imaging (WSI) has emerged as a promising tool for developing and utilizing such algorithms in diagnostic and experimental pathology. However, patch-wise analysis of WSIs often falls short of capturing the intricate cell-level interactions within local microenvironment. A robust alternative to address this limitation involves leveraging cell graph representations, thereby enabling a more detailed analysis of local cell interactions. These cell graphs encapsulate the local spatial arrangement of cells in histopathology images, a factor proven to have significant prognostic value. Graph Neural Networks (GNNs) can effectively utilize these spatial feature representations and other features, demonstrating promising performance across classification tasks of varying complexities. It is also feasible to distill the knowledge acquired by deep neural networks to smaller student models through knowledge distillation (KD), achieving goals such as model compression and performance enhancement. Traditional approaches for constructing cell graphs generally rely on edge thresholds defined by sparsity/density or the assumption that nearby cells interact. However, such methods may fail to capture biologically meaningful interactions. Additionally, existing works in knowledge distillation primarily focus on distilling knowledge between neural networks. We designed cell graphs with biologically informed edge thresholds or criteria to address these limitations, moving beyond density/sparsity-based definitions. Furthermore, we demonstrated that student models do not need to be neural networks. Even non-neural models can learn from a neural network teacher. We evaluated our approach across varying dataset complexities, including the presence or absence of distribution shifts, varying degrees of imbalance, and different levels of graph complexity for training GNNs. We also investigated whether softened probabilities obtained from calibrated logits offered better guidance than raw logits. Our experiments revealed that the teacher’s guidance was effective when distribution shifts existed in the data. The teacher model demonstrated decent performance due to its higher complexity and ability to use cell graph structures and features. Its logits provided rich information and regularization to students, mitigating the risk of overfitting the training distribution. We also examined the differences in feature importance between student models trained with the teacher’s logits and their counterparts trained on hard labels. In particular, the student model demonstrated a stronger emphasis on morphological features in the Tuberculosis (TB) dataset than the models trained with hard labels. This emphasis aligns closely with the features that pathologists typically prioritize for diagnostic purposes. Future work could explore designing alternative teacher models, evaluating the proposed approach on larger datasets, and investigating causal knowledge distillation as a potential extension.

Derivation of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning

Article 18 August 2022

Scalable deep learning artificial intelligence histopathology slide analysis and validation

Article Open access 05 November 2024

Usability of deep learning and H&E images predict disease outcome-emerging tool to optimize clinical trials

Article Open access 15 June 2022

Introduction

Cell graphs have emerged as a powerful tool for capturing the spatial and functional relationships within tissues. They encapsulate cellular and tissue-level architecture by representing cells as nodes and their interactions as edges. They are particularly valuable for bridging the gap between molecular details and their collective impact on larger biological processes, such as wound healing, tumor progression, and immune response¹. The cell-graph technique seeks to uncover the structure-function relationship by modeling the structural organization of tissue using graph theory. For instance, in the context of breast cancer, cancer cells often cluster together to form dense regions of abnormal tissue. This clustering reflects the biological processes underlying tumor growth², such as rapid cell division, altered adhesion properties, and disrupted tissue architecture. By analyzing the spatial distribution and interactions of these clustered cells, the cell-graph approach can provide insights into the functional state of the tissue, such as tumor aggressiveness. This study focuses on three primary cell graph-based datasets: Tuberculosis (TB), Placenta, and Breast Cancer Classification. TB is a highly contagious disease and a leading cause of ill health and mortality worldwide. According to the World Health Organization’s report on TB³, an estimated 1.25 million people succumbed to the disease in 2023. Pulmonary TB, primarily caused by an infectious bacterium, predominantly impacts the lungs through airborne transmission⁴. Granulomas in lung tissue are characteristic of both human and experimental pulmonary tuberculosis^5,6. Identifying acid-fast bacilli (AFB) in stained samples is essential for diagnosing tuberculosis⁷.

Whole-slide imaging makes it easier to digitally examine these stained samples, allowing for high-resolution, in-depth tissue investigation. They preserve fine-grained cellular morphology and local tissue architecture that is often lost through downsampling. Traditional WSI analysis pipelines resort to patch-based processing or downsampling, fragmenting tissue structure and sacrificing essential contextual information⁸. In our approach, we construct cell graphs from whole slide images that integrate local morphological features with spatial context. A deep GNN is then applied to these graphs to learn complex cell interactions, translating the rich WSI content into structured, relational representations. The edge threshold for intercellular communication is crucial in constructing biologically meaningful cell graphs. Incorporating pathologist insights can help refine this threshold, ensuring the graph representation aligns with the underlying cellular interactions. We determined edge thresholds based on the biological rationale for the cell graphs we constructed and validated them through consultations with our domain expert. For the TB dataset cell graphs, nodes represent either acid-fast bacilli (AFB) or the nucleus of activated macrophages, and edge thresholds are based on the length of cords of the M.tb infected cells after 72 hours of infection⁹ and the fact that macrophages extend pseudopods to sense their environment¹⁰. The Placenta dataset represents diverse histological structures essential to placental biology, including various types of trophoblastic villi (TVilli, MIVilli, SVilli, AVilli), Sprouts, Chorion, Maternal cells, Fibrin, and Avascular regions. These structures capture key functional and structural aspects of the placenta. Cell graphs from this dataset reveal how these structures collectively contribute to placental function. Finally, cell graphs from the breast cancer dataset show the spatial arrangement and interactions between tumor cells, lymphocytes, and stromal cells. Edges in these cell graphs were constructed based on factors such as immune surveillance by lymphocytes and the clustering behavior of tumor cells facilitated by adhesion molecules¹¹. They captured important patterns, such as tumor-immune interactions and interactions with stromal cells, essential for understanding disease progression and prognosis.

The cell-graph technique leverages image processing, feature extraction, and machine learning algorithms to establish a quantitative relationship between structure and function¹. Our approach extends this by employing a GNN trained on these cell graphs to learn and model this relationship effectively. Within our proposed graph model, which we term as Cell Graph Jumping Knowledge Neural Network (CG-JKNN), we incorporate the concept of ’jumping knowledge’¹² from GraphSAGE layers. This approach aggregates information from multiple network layers rather than relying solely on the final layer. We enhance the jumping knowledge with GATv2’s attention mechanism to refine this process further. This allows the model to focus on the most informative nodes dynamically.

An important question is whether the knowledge learned by complex deep learning models, such as GNNs in our work, can be effectively distilled into simpler, non-neural network-based models. The answer lies in knowledge distillation (KD), a process where the knowledge from a teacher model (in this case, a GNN) is distilled into student models, typically less complex. Knowledge distillation on graphs brings the advantages of KD into graph learning. This approach primarily serves two objectives: model compression and performance improvement. Model compression focuses on creating a smaller student model than the teacher model. After distillation, the student model achieves a performance comparable to that of the teacher while requiring fewer parameters. Performance improvement focuses on transferring knowledge from the teacher to the student model, aiming to enhance the student’s performance beyond that of a model trained without knowledge distillation¹³. The student model may be smaller, similar, or architecturally different from the teacher. The other main goals of KD are knowledge adaptation and knowledge expansion¹⁴. Knowledge adaptation focuses on helping student networks perform well on new, unseen target domains by using knowledge from teacher networks trained on similar source domains. Knowledge expansion aims to create student networks that are more capable and perform better than the teacher networks. In our work, we focus on model compression and performance improvement. Existing approaches to knowledge distillation mainly focus on neural network-based student models^15,16,17 using their iterative learning capabilities to align with the teacher’s outputs. However, this work demonstrates that knowledge can be distilled to non-neural network-based models, such as tree-based ensemble models. The knowledge that can be distilled can be categorized into various forms, including response-based, intermediate, relation-based, and mutual information-based representations¹⁴. In this work, we focus on response-based knowledge distillation, using the logits generated by a deep GNN as targets to train tree-based ensemble regressor models. These student models are significantly less complex than the teacher. Our primary objective is to evaluate whether the teacher’s guidance through logits provides better insights into the student models than traditional hard labels. We will use the term “Guidance” throughout the paper, which refers to the teacher model’s ability to provide detailed class distinctions and enhance the student model’s performance and generalization through its logits. Literature suggests that students trained on logits are better equipped to mimic the behavior of the teacher model¹⁸. This approach enhances the student’s performance and enables it to be a partial proxy for interpreting the teacher’s decision-making process. In one of our ablation studies, we analyze the differences in feature importance between the student trained on logits and its counterpart trained on hard labels to identify any notable distinctions. To measure the efficacy of this distillation process, we employ a distillation quality metric that balances model complexity and performance. Furthermore, we extend our analysis to explore whether calibration (aligning the probabilities derived from logits with the true likelihood of events) improves the guidance provided by logits. Additionally, we evaluate the efficacy of our approach under varying dataset complexities, including the presence or absence of distribution shifts, imbalanced data, different feature sets, and different levels of training graph complexity. To broaden the applicability of our method, we also test it on datasets beyond cell graphs.

In this study, we addressed key questions to learn the efficacy of knowledge distillation in our proposed framework. Specifically, we sought to answer the following:

Do all student models benefit from knowledge distilled from the teacher GNN trained on cell graphs with local cell graph features and/or morphological features under varying dataset complexities such as the presence of distribution shifts?
Do the features selected by models trained on hard labels differ from those chosen by the students, and can these differences provide insights into the teacher’s guidance?
Can a student model achieve better performance when trained using the combined guidance of the teacher model and the best-performing student, compared to being taught solely by the teacher model?
Can calibration of teacher logits provide better guidance to student models?

The major contributions of this work can be summarized as follows:

Inspired by Fukui et al.¹⁹, we proposed a knowledge distillation framework that uses the logits from a GNN model with jumping knowledge, which acts as the teacher, to train non-neural network models as student models. To our knowledge, this is the first work exploring a teacher trained on cell graphs to guide non-neural network-based student models.
We proposed a method to approximate the number of parameters/complexity of student models using the asymptotic equivalence between the Akaike Information Criterion (AIC) and leave-one-out cross-validation.
We evaluated the efficacy of knowledge distillation under diverse dataset conditions, including varying degrees of imbalance, distribution shifts, and varying graph complexities. We also tested our approach across various feature sets, including combinations of cell graph features and morphological features, individual feature sets (only cell graph features or morphological features), and non-cell graph features.
We explored the impact of post-calibrating logits to enhance the guidance provided by teacher models to student models. We proposed a modified distillation quality metric that effectively measures the quality of knowledge distilled, even in scenarios where the student model outperforms the teacher.
We conducted ablation studies to determine whether the best-performing student model, in combination with the teacher model, could improve guidance. Additionally, we analyzed how feature importance varied when guided by the teacher and explored the biological relevance of these features.

Section “Related works” discusses prior research in the domain. Section “Methods” describes this study’s proposed methodology and framework. Section “Results” presents the experimental results and evaluates the performance of our approach. Section “Discussion and major takeaways” analyzes the implications of our findings and summarizes the key takeaways of this study. Section “Limitations of our work” outlines the limitations of our approach. Section “Conclusion and future work” summarizes the contributions and identifies areas for future work.

Related works

Cell graphs and GNNs trained on cell graphs: applications in disease prediction and classification

Graph construction for modeling cellular interactions often assumes that neighboring cells are more likely to interact. To capture these interactions, methods such as Delaunay triangulation^1,20,21 and K-nearest-neighbor (KNN)^22,23,24,25 are widely employed. The Waxman model²⁶ is another approach that uses an exponential decay function of Euclidean distance to define edges probabilistically. Numerous studies have utilized cell graphs to gain insights into the organization and behavior of cells within tissues. The pioneering work on cell graphs highlighted that the most effective cell-graph construction methods emerge from combining physics-driven and data-driven paradigms¹. The study presented in²⁷ used a computational method using cell-graph evolution to model glioma malignancy. It linked graph phases to cancer severity through connectivity analysis of cell graphs constructed from tissue photomicrographs. The authors in²⁸ presented a computational method to model glioma malignancy using cell-graph topology from tissue images. Cell-graph edges were generated using the Waxman model. By analyzing graph metrics of cancerous cell clusters, the method achieved 85% accuracy at the cellular level and 100% accuracy at the tissue level. An augmented cell-graph (ACG) method for diagnosing malignant glioma from low-magnification tissue images was introduced in²⁹. It represented cell clusters as nodes and their relationships as weighted edges. Tested on 646 brain biopsy samples, the approach achieved 97.53% sensitivity and specificities of 93.33% (inflamed) and 98.15% (healthy) at the tissue level. Gunduz-Demir³⁰ introduced an object-graph-based approach for gland segmentation by leveraging the organizational properties of primitive objects. It achieved high segmentation accuracy when applied to colon tissue images and demonstrated robustness to artifacts and tissue variances. The authors in³¹ introduced a Cell Graph Transformer (CGT) for nuclei classification in histopathology images. A topology-aware pretraining method using a graph convolutional network (GCN) was proposed to learn a feature extractor to address challenges with noisy self-attention scores in complex cell graphs. The study in³² presented sigGCN, a multimodal deep learning model combining a graph convolutional network (GCN) and neural network to integrate gene interaction networks for cell classification. The method outperformed existing traditional approaches in both within-dataset and cross-dataset classifications. Graph neural network-based approach that leveraged cell graphs from multiplexed immunohistochemistry (mIHC) images to predict patient survival and digitally stage gastric cancer was proposed in³³. Edges in the cell graph were established based on the Euclidean distance between cell pairs, connecting cells separated by less than 20 $\mu$m. It outperformed traditional staging systems, achieving high AUC scores (0.960 for binary and 0.771-0.904 for ternary classification). A novel cell-graph convolutional neural network for colorectal cancer (CRC) grading that models large histology images as graphs was proposed in²³. It incorporated both nuclear appearance and spatial information. An edge was placed between two nuclei if they were at a fixed distance from each other. By introducing Adaptive GraphSage for multi-scale feature fusion and a sampling technique to address graph redundancy, CGC-Net effectively captured tissue micro-environment structures. A hierarchical Transformer Graph Neural Network, combining GNN and Transformer architectures, was introduced in²⁴. The main aim was to achieve colorectal adenocarcinoma cancer (CRA) grading using the cell graph that was constructed using the KNN approach. It used a Masked Nuclei Patch (MNP) strategy to train a ResNet-50 to extract representative nuclei features. The transformer module captured long-distance dependencies, achieving state-of-the-art results on CRA grading tasks. The authors in³⁴ proposed Feature-Driven Local Cell Graphs (FeDeG) for constructing cell graphs by integrating spatial proximity and nuclear attributes like shape, size, and texture. Graph-derived metrics extracted from FeDeGs were used with a linear discriminant classifier, achieving an AUC of 0.68. A Hierarchical Cell-to-Tissue (HACT) graph representation utilizing the cell graphs was proposed in³⁵. The tissue structure and functionality were modeled using a novel hierarchical graph neural network (HACT-Net). Using the Breast Carcinoma Subtyping (BRACS) dataset, HACT-Net outperformed state-of-the-art methods and individual pathologists.

Knowledge distillation in graphs

With the demand for efficient models, KD is an ever-developing field. Among the various types of information that can be distilled, including logits, embeddings, and graph structures, we specifically use logits as the training labels for the student models. Many works have focused on transferring logits as a form of knowledge in knowledge distillation. The authors in³⁶ systematically compared different knowledge sources–features, logits, and gradients in knowledge distillation by approximating the KL-divergence criterion. They analyzed their effectiveness in model compression and incremental learning and found that logits were generally more efficient. Recently, a refined knowledge distillation method that employed labeling information to refine teacher logit dynamically and to eliminate misleading information from the teacher was introduced in³⁷. Distilling graph structure information involves transferring knowledge about the connectivity and relationships between nodes and edges³⁸, which is crucial for modeling graph data. Additionally, some works distill learned node embeddings from the intermediate layers of teacher models to guide the student model’s learning.

In the context of knowledge distillation, various setups exist to transfer knowledge. There are teacher-free networks where the student model learns independently without a teacher. In teacher-to-student networks, the knowledge transfer can involve one or multiple teachers guiding the students. Additionally, distillation can be categorized as offline or online. Online distillation refers to a scheme where the teacher and student models are trained simultaneously in an end-to-end manner. In contrast, offline distillation involves a pre-trained teacher model that facilitates the student’s training without undergoing further updates. In our study, we utilize a teacher-to-student setup with two configurations: a single teacher guiding the student and a combination of the teacher and the best-performing student acting as teachers. Additionally, our approach falls under the category of offline distillation, as the teacher models are pre-trained and remain unchanged during the training of the student models.

Numerous works have been conducted to highlight the use of knowledge distillation in graphs. In³⁹, the authors proposed a method for compressing a k-layered graph convolution network (GCN) by repeating a single GCN layer k times and distilling both the logits and final node embeddings. The authors in⁴⁰ used two heterogeneous teacher models to distill their embeddings via a topological attribution map and logits. In⁴¹, the authors trained a teacher on offline graph snapshots with a self-attention mechanism to distill to a smaller, more efficient student model making predictions on online graph snapshots. A neighbor distillation method to distill local structure knowledge and to use peer node information to learn the local structure was proposed in⁴². The approach in⁴³ used logit distillation and auxiliary representation distillation methods such as Locality Structure Preserving distillation (LSP)⁴⁴. In⁴⁵, the authors used adversarial training for KD by applying a discriminator to the embeddings and logits of the student and teacher models. The authors in⁴⁶ proposed a method for fair distillation where a student model learned both the distilled logits and a proxy for bias from the teacher, which was removed during testing with the rationale that it contained most of the information on bias and its exclusion would result in fair predictions. An interesting logits-based KD method termed Decoupled Graph Knowledge Distillation (DGKD) was proposed in⁴⁷. It reformulated the distillation loss into the components of target class (TCGD) and non-target class (NCGD). By decoupling the fixed weight between these losses and addressing their negative correlation, DGKD dynamically adjusted the weights for different data samples. This led to improved prediction accuracy for student MLP. The authors in⁴⁸ proposed Knowledge Distillation for Graph Augmentation (KDGA) that mitigated the adverse effects of distribution shifts caused by graph augmentation. KDGA transferred knowledge from a GNN teacher trained on augmented graphs to a partially parameter-shared student tested on the original graph. This helped to improve performance across various GNN architectures and augmentation methods. In⁴⁹, they transferred knowledge from two specialized teacher models, one focused on features and the other on structure, using a teacher-student distillation framework. The feature-level teacher guided the student on completing and leveraging node features, while the structure-level teacher focused on graph topology. However, these works primarily focused on distilling knowledge from a GNN to another GNN or other neural networks. In¹⁹, the authors proposed a distillation method that utilized information extracted from neural networks to train non-neural network models, such as support vector machines, random forests, and gradient-boosting decision trees. Their study was limited to a single image-based dataset and did not provide a detailed analysis of why specific student models failed to achieve the desired performance when trained with logits obtained from the teacher CNN. Moreover, they evaluated their approach using only two out of ten available classes for simplicity, which does not adequately demonstrate the efficacy of KD in a multiclass setting.

Problem statement

Currently used methods for building cell graphs typically use a single-edge threshold to represent every interaction between cells. These thresholds are often chosen based on factors such as achieving denser graphs. However, this approach overlooks the biological diversity of interactions, as different cell types exhibit distinct interaction patterns that a uniform edge threshold cannot adequately capture. A more biologically informed methodology for defining these thresholds is necessary to better reflect the underlying cellular relationships. In the context of knowledge distillation from GNNs, most existing works focus on transferring knowledge from GNNs to other neural networks. However, student models need not be limited to neural networks. They can include non-neural models. Furthermore, evaluating the efficacy of knowledge distillation in our specific setup requires a broader understanding of its behavior under varying dataset complexities, including scenarios with distribution shifts, multiple classes, and other challenges.

Methods

Datasets based on cell graphs and non-cell graphs

For this work, we utilized three cell graph-based datasets: one from our previous paper on tuberculosis (TB)⁵⁰, another dataset from placenta histology⁵¹, and lastly, the TCGA Breast Cancer Cell Classification Dataset (BRCA-M2C)⁵². The TB dataset contained 44 whole slide images (WSIs) with an average size of 42,831 x 41,159 pixels at 40X magnification. The nodes were classified into acid-fast bacilli (AFB) and the nucleus of activated macrophages. The approach used to determine the cell locations and classify the cell types is detailed in our previous work⁵⁰. We used 34 WSIs for training and validation, while 10 WSIs were reserved for the test set. The train and test WSIs used in this study differed from those proposed in⁵⁰. The training set had 90878 nodes, the validation set had 22708 nodes, and the test set had 76316 nodes.

The placenta dataset consisted of two cell graphs constructed from two placenta histology WSIs, combined into a single graph with nine classes. We utilized the original 64-dimensional feature set provided with the dataset for our analysis. These features primarily focussed on the morphological characteristics of the cells. Our goal was to evaluate the efficacy of knowledge distillation with cell graph datasets where the cell graph features were not included in the training process. The process of feature extraction is described in⁵¹. Additionally, we followed the dataset’s original train, validation, and test split (considering only labeled nodes).

The BRCA-M2C dataset (Breast Cancer Dataset)⁵² provided dot annotations for multi-class cell classification in breast cancer images, including the annotated cells’ coordinates and corresponding labels. The cell extraction and labeling process can be found in⁵². These images were patches extracted from 1000x1000 pixels at the highest resolution and downsampled to 20x. All images were around 500x500 pixels. The cell classes included lymphocytes, breast cancer cells, and stromal cells. There were 80 image data (coordinates of the annotated cells along with their corresponding labels) under the training set, 10 image data under the validation set, and the test set consisted of 30 image data. We combined training and validation data while keeping the test data unchanged. This resulted in 19602 training nodes, 2178 validation set nodes, and 8858 test set nodes.

To determine the generalizability of our approach to non-cell graph-based datasets and in the absence of features extracted from cell graphs, we used three non-cell graph-based datasets: CoauthorCS, CoauthorPhysics and a synthetic dataset. These datasets consisted of a single graph. The CoauthorCS dataset consisted of 18,333 nodes and 163,788 edges, with nodes divided into 15 classes. A 6,805-dimensional feature vector represented each node. The training set had 12833 nodes, the validation set had 3666 nodes, and the test set had 1834 nodes. Similarly, the CoauthorPhysics dataset contained 34,493 nodes and 495,924 edges, with nodes categorized into five classes. Node features in this dataset were 8,415-dimensional vectors. The training set had 24145 nodes, the validation set had 6898 nodes, and the test set had 3450 nodes. These datasets were only used to evaluate the applicability of our approach to non-cell graph settings and were not included in ablation studies. We generated a synthetic dataset of 60,000 nodes using the preferential attachment mechanism of the Barabási-Albert model⁵³. Seven topological features were extracted for this graph to represent its structural properties. The dataset training set contained 42,000 nodes, 12,000 nodes were present in the validation set, and 6,000 nodes were present in the test set, respectively.

Generally, datasets with a minority class proportion between 20% and 40% are considered to have mild imbalance, those with proportions from 1% to 20% are categorized as moderately imbalanced, and datasets with a minority class proportion of less than 1% are considered extremely imbalanced⁵⁴. Based on this classification, TB and Breast cancer datasets had a mild imbalance. The Placenta, CoauthorCS, and Synthetic datasets demonstrated extreme class imbalance. The CoauthorPhysics dataset had a moderate imbalance.

Construction of cell graph

Edge construction in cell graphs estimates the biological likelihood that neighboring cells interact within the same structure. The edge threshold for intercellular communication is critical in cellular studies, and many investigations have aimed to determine the optimal distance for accurately modeling these interactions. Pathologists’ input provides valuable guidance to refine graph representations and ensure they accurately reflect the biological relationships between cells⁵⁵. Many prior works have employed a single threshold value to map cell-cell interactions^23,33, while some have experimented with varying edge thresholds, such as 60, 75, and 90 $\mu$m, to identify an appropriate threshold value⁵⁶. In contrast, our approach uses distinct threshold values for each cell-cell pair,

In the TB dataset, nodes represent either AFBs or the nucleus of activated macrophages. Edge thresholds were based upon the length of cords of the M.tb infected cells after 72 hours of infection⁹ and the fact that macrophages can extend their pseudopods beyond their normal boundary (radius) to detect other cells farther away. We hypothesize that AFBs can interact with other AFBs within a distance of 150 $\mu$m, equivalent to 615 pixels at the magnification used in this study⁵⁷. Likewise, activated macrophage nuclei may interact with both AFBs and each other if they are within 500 $\mu$m (2049 pixels)¹⁰. Our domain expert has thoroughly reviewed and validated these threshold values.

The adjacency matrix is computed as follows:

$$A_{i j}\left\{ \begin{array}{lc} 1 & \text{ if } Distance(u, v)<d \\ 0 & \text{ otherwise. } \end{array}\right.$$

Distance denotes euclidean distance computing using the equation 1. The coordinates $(x_u,y_u)$ belongs to node ’u’ and the coordinates $(x_v,y_v)$ belons to node ’v’ in the image.

$$\begin{aligned} d(u, v)=\sqrt{\left( x_u-x_v\right) ^2+\left( y_u-y_v\right) ^2} \end{aligned}$$

(1)

The distance threshold values are tabulated in the Table 1.

Table 1 Distance thresholds.

Subjects

Abstract

Similar content being viewed by others

Derivation of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning

Scalable deep learning artificial intelligence histopathology slide analysis and validation

Usability of deep learning and H&E images predict disease outcome-emerging tool to optimize clinical trials

Introduction

Related works

Cell graphs and GNNs trained on cell graphs: applications in disease prediction and classification

Knowledge distillation in graphs

Problem statement

Methods

Datasets based on cell graphs and non-cell graphs

Construction of cell graph

Are all these edges required?

Feature extraction

TB dataset

Placenta dataset

BRCA-M2C dataset

Distilling the knowledge from CG-JKNN (teacher) to tree-based ensembles (students)

Estimating the complexity of tree-based ensemble models-an approximation and distillation quality score

Ablation studies

Combining teacher and top student: ensemble model training

Feature importance: comparing students trained with and without teacher guidance

Comparing effectiveness of knowledge distillation into ANN vs. non-neural student models

Generalizability of knowledge distillation under various dataset complexities

Graph complexity

Distribution shift in the data

Can logit calibration enhance student guidance?

Experimental setup and hyperparameters

Results

Covariate and label shift across datasets

Performance of student models trained on TB dataset

Performance of student models trained on placenta dataset

Performance of student models trained on TCGA breast cancer cell classification dataset

Ablation study results

Feature importance: comparing students trained with and without teacher guidance

Training with ensembled output

A comparative analysis of knowledge distillation in neural and non-neural student models

Results of generalizability on non-biological graph datasets

Performance on CoauthorPhysicsDataset

Performance on CoauthorCSDataset

Performance on synthetic dataset

Factors influencing the efficacy of our approach across datasets

Effectiveness of our approach: successes, limitations, and when it might not be too useful

Discussion and major takeaways

Limitations of our work

Conclusion and future work

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Multimodal learning of melt pool dynamics in laser powder bed fusion

Search

Quick links