Attention-based deep learning for accurate cell image analysis

Gao, Xiangrui; Zhang, Fan; Guo, Xueyu; Yao, Mengcheng; Wang, Xiaoxiao; Chen, Dong; Zhang, Genwei; Wang, Xiaodong; Lai, Lipeng

doi:10.1038/s41598-025-85608-9

Download PDF

Article
Open access
Published: 08 January 2025

Attention-based deep learning for accurate cell image analysis

Xiangrui Gao¹,
Fan Zhang¹,
Xueyu Guo¹,
Mengcheng Yao¹,
Xiaoxiao Wang¹,
Dong Chen¹,
Genwei Zhang¹,
Xiaodong Wang¹ &
…
Lipeng Lai¹

Scientific Reports volume 15, Article number: 1265 (2025) Cite this article

3549 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

High-content analysis (HCA) holds enormous potential for drug discovery and research, but widely used methods can be cumbersome and yield inaccurate results. Noisy and redundant signals in cell images impede accurate deep learning-based image analysis. To address these issues, we introduce X-Profiler, a novel HCA method that combines cellular experiments, image processing, and deep learning modeling. X-Profiler combines the convolutional neural network and Transformer to encode high-content images, effectively filtering out noisy signals and precisely characterizing cell phenotypes. In comparative tests on drug-induced cardiotoxicity, mitochondrial toxicity classification, and compound classification, X-Profiler outperformed both DeepProfiler and CellProfiler, as two highly recognized and representative methods in this field. Our results demonstrate the utility and versatility of X-Profiler, and we anticipate its wide application in HCA for advancing drug development and disease research.

Image-based profiling for drug discovery: due for a machine-learning upgrade?

Article 22 December 2020

Improving the power of drug toxicity measurements by quantitative nuclei imaging

Article Open access 18 April 2024

Evaluating feature extraction in ovarian cancer cell line co-cultures using deep neural networks

Article Open access 25 February 2025

Introduction

The Developing novel pharmaceutical drugs represents a substantial investment that involves significant amount of time and resources, but with low success rate. The major obstacles hindering drug development include ineffective drug activity, intractable drug toxicity, as well as marketing difficulties^1,2. Medical research focuses on identifying the causes and deciphering the mechanisms related to a disease, aiming to eventually develop accurate diagnostic tools and effective treatments. In the last decades, high-throughput methods that are able to generate enormous data sets in a short amount of time and in a cost-effective manner³. The exploitation of so-called ‘big data’ will enable us to undertake research projects never previously possible but should also stimulate a re-evaluation of all our data practices. The high-throughput and affordability of current technologies such as omics and imaging, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task⁴. Data-driven medicinal chemistry approaches have the potential to improve decision making in drug discovery projects, providing that all researchers embrace the role of ‘data scientist’ and uncover the meaningful relationships and patterns in available data⁵. The success of developing deep learning models heavily relies on the availability of vast amounts of data and advanced algorithms⁶. Massive data helps to train the model effectively without premature overfitting, while the reasonableness and complexity of the network structure endow the model with good memory and information extraction capability⁷. In addition, the diversity of enormous data makes model training and prediction easier by eliminating complex preprocessing steps, such as segmentation, tracking, tracing, and spatial conversion. As a result, the task can be performed through an end-to-end prediction, simplifying the entire process⁸.

Cellular morphology is a potentially rich data source for interrogating biological perturbations, especially at a large scale^9,10,11. For example, Cellular morphological profiling of compounds has been used to determine their mechanism of action^8,12,13, identify their targets^14,15, discover relationships with genes^16,17, and characterize cellular heterogeneity¹⁸. Genes have been analyzed by creating profiles of cell populations in which the gene is perturbed by CRISRP and RNA interference (RNAi); these profiles have been used to represents the functional landscape of essential human genes^19,20,21 and identify genetic interactions^22,23, or characterize cellular heterogeneity²⁴. Nowadays the Joint Undertaking for Morphological Profiling (JUMP)-Cell Painting Consortium released a dataset which contain images and profiles from the Cell Painting assay for over 116,750 unique compounds, over-expression of 12,602 genes, and knockout of 7,975 genes using CRISPR-Cas9 in human osteosarcoma cells (U2OS)²⁵. One of the main tools currently used in image processing and interpretable feature extraction is CellProfiler^26,27, developed by the imaging platform of Broad Institute, which designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically, a wide variety of measurements can be generated for each identified cell or subcellular compartment, including morphology²⁸, intensity, and texture²⁹ among others. Another feature extraction tool based on deep learning deepprofiler presented by Cimini and Carpenter et al. used weakly supervised learning (WSL) for modeling associations between images and treatments, and showed that it encodes both confounding factors and phenotypic features in the learned representation³⁰.

Despite the obvious advantages of ‘big data’, the field has to overcome important challenges most of the analyses performed have inherent biologic and experimental errors and rely on capturing a snapshot of complex and dynamic biological systems³¹. Consequently, untimely sample collection due to an incorrect experimental design or simply due to randomness can lead to too much ‘noise’ that would not allow us to clearly identify inciting factors or errors³. During cellular phenotypic experiments, the compound effects on cells are random, and the properties of compounds cannot be accurately distinguished from images of one or a few cells³². Therefore, a ‘noise’ filter tool was needed to adaptively filter out irrelevant cell-slice images and accurately extract features for cell phenotype characterization. Transformers use a self-attention mechanism to weigh the importance of each token in the context of the entire input sequence. This allows the model to capture long-range dependencies and relationships between tokens³³. To address the aforementioned issues, we combined the convolutional neural network with Transformer to developed X-Profiler, a deep learning-based method for processing ‘noise’ filter and downstream task related feature extraction, to extract valuable information from high-content images for drug analysis.

In this study, We conducted a comprehensive analysis of cell phenotype images, covering aspects from data processing to model training and validation. To evaluate the potential applicability of our method in high-content image processing, we have chosen three widely recognized downstream tasks in drug discovery: Task 1—determining drug inhibition on hERG (the human Ether-à-go-go-Related Gene) ion channels, Task 2—predicting drug-induced mitochondrial toxicity, and Task 3—classifying compounds. In all tasks, X-Profiler has consistently shown performance surpassing the state-of-the-art (SOTA) levels, demonstrating more accurate and effective feature extraction capabilities.

Results and disscussion

Model architecture of X-Profiler

In this study, we address the issue of high impurity in the dataset by introducing a novel deep learning framework. Instead of expecting each individual slice of the dataset to conform directly to a desired label mapping, we explicitly assign them distinct rankings and weights. Formally, we utilized cell painting data to train and verify our model, segmenting large cellular images into individual slices based on the positional information of cells and constructing intercellular associations by arbitrarily selecting portions of single slices. We introduced the concept of a “patch”—bundles of different single-cell slice images subjected to the same perturbation. A patch is derived from the images of cells obtained from a single well in a wet lab experiment, stained and processed using Cell Painting. After cell center localization, images are cropped to a size of 96 × 96 pixels centered on the cells. Due to the density of cells, some images may include multiple adjacent cells. To develop an efficient and robust method, we utilized deep learning models to autonomously extract features from these cell images. The information between cells is valuable, so we do not solely rely on segmented images containing only one cell. As a result, we obtained a large number of cell images, most of which contain a single cell and its surrounding cytoplasm, with a few containing multiple cells.

During training, we randomly sampled several cell images under the same compound treatment from these images to form a “patch.” By employing a ResNet to extract image features and utilizing a transformer to capture the relational patterns within the same patch, we can identify and exclude poor-quality images, thereby enhancing the model’s accuracy.

The training and validation processes of the model are illustrated in Fig. 1a. During these processes, 6 to 12 single-cell image slices were bundled into a patch and fed into the model in batches.

In each epoch, which is a full pass through the dataset, the model performs a forward pass to make predictions and then calculates the loss by comparing these predictions to the actual labels. The backward pass follows, where the model uses backpropagation to compute the gradient of the loss with respect to each weight. The optimizer then updates the model’s weights in an effort to minimize the loss. After each epoch, the model’s performance is evaluated on a separate validation set, which helps monitor its ability to generalize to unseen data. During the prediction phase (Fig. 1b), we performed random sampling on the images and packaged single-cell images into patches, comprising 10 to 750 units. Subsequent to this, the model engages in forward propagation and employs the output of the linear layer for the score prediction. Synchronously with the acquisition of the model’s predictions, we are also presented with a ranking for each individual cell image within the respective patch, culminating the comprehensive data processing endeavor.

X-Profiler’s performance on hERG inhibition, mitochondrial toxicity and compounds classification

We designed three tasks to evaluate the capability of X-Profiler. For the hERG inhibition task, we selected 100 compounds (47 toxic and 53 non-toxic) for HCS experiments and generated corresponding high-content images using a high-content imager. We chose 38 compounds (19 toxic and 19 non-toxic) as test dataset for the determination of drug inhibition on hERG ion channels. The high content images were employed to assess the hERG inhibition classification capabilities of the three models: X-Profiler, DeepProfiler, and CellProfiler. Our results showed that X-Profiler outperformed feature extractor DeepProfiler and CellProfiler, achieving an accuracy of 90.6% in cell images classification and the precision, recall and f1 scores for X-Profiler were 80.0%, 95.5% and 86.2%, respectively (Fig. 2a). Moreover, we showed the landscape for compound features using UMAP and two separated cluster for hERG channel inhibitors and non-hERG channel inhibitors were presented (Fig. 2b). The silhouette score (ss) and adjusted rand score (ars) of clusters between toxicity and non-toxicity compounds was 0.446 and 0.272, respectively, which indicated X-Profiler have the ability to capture the cell phenotype changes affect by hERG channel inhibitors.

For the mitochondrial toxicity task, we mapped all 100 compounds with Tox21 dataset annotation³⁴. Out of these, 77 compounds bore additional mitochondrial toxicity labels (Supplementary Table 2). We selected a test set of 15 compounds, of which 1 were labeled ‘Active’, 3 were labeled ‘Inactive’, and 11 were labeled ‘Inconclusive’. Mitochondrial toxicity caused by certain compounds is a widespread form of organ toxicity, which can result in multiorgan damage in the heart, liver, bone, and brain^35,36,37. The alterations in cell morphology, texture, and intensity caused by compounds are strongly correlated with mitochondrial toxicity³⁸. We evaluated the classification performance of the three models. The X-Profiler model outperforms the other two models in terms of accuracy, recall, precision and f1 score, and the values for X-Profiler are as follows: 54.5%, 52.4%, 47.3% and 43.3%, as showed in Fig. 2c. The landscape for mitochondrial toxicity was showed in Fig. 2d and the silhouette score and adjusted rand score of clusters between activate, inactivate and inconclusive compounds was 0.023 and 0.083, respectively. For the suboptimal performance of Mitochondrial toxicity, there may be two reasons for this. (1) the among of labeled drugs are decreased and the classification increased, less data implies lower model robustness; (2) the effects of mitochondrial dysfunction are often confusingly cell type-specific, as is the case for the majority of known mitochondrial diseases³⁹, the human HepG2 cells was applied for labeling, while iPSC-CMs might exhibit different toxicity reactions to the same compounds.

For the compounds classification task, it involved classifying images based on the cell response to different compounds. For the same compound, 20% of the images were used for testing, while 80% of the images were used for training. Our results showed that the X-Profiler model outperformed DeepProfiler and CellProfiler, achieving an accuracy, precision, recall and f1 score of 0.867, 0.892, 0.871 and 0.881, respectively. Apparently, X-Profiler demonstrates superior ability in discerning relationships among images (Fig. 2e and f).

Comparison of distance and angle for high-dimensional features

In our investigation, we quantified the distance and cosine value between the image high-dimensional features of various compounds at differing concentrations (X-Profiler: 512-d; DeepProfiler: 672-d; CellProfiler: 613-d). We intended to enhance our understanding of the model’s capacity to discriminate between cellular features post-pharmacological intervention. We determined the cosine similarity of compounds with DMSO, as illustrated in Fig. 3a. The cosine values spanned from − 1 to 1, indicative of an angular spectrum extending from − 180 degrees to 180 degrees. It was observed that the cosine values of toxic versus non-toxic compounds exhibited distinct angles within a high-dimensional space. Although the Mahalanobis distance of the compounds in relation to DMSO did not reveal a marked variance between toxic and non-toxic compounds, we noted that the Mahalanobis distance for varying concentrations within a single compound performed that the distance value augmented concomitant with concentration and these values were utilized to deduce the IC50 values for each compound (Fig. 3b). A comparative analysis of X-Profiler with CellProfiler and DeepProfiler is presented in Fig. 3c. The distribution of the absolute discrepancies between true and predicted IC50 values underscores X-Profiler’s superior precision, with a mean and standard deviation for all compounds being 2.66 and 2.94, respectively. In contrast, DeepProfiler and CellProfiler yielded mean values of 4.45 ± 4.84 and 3.48 ± 3.56, respectively. The comparative analysis of IC50 prediction curves for the compound carvedilol by X-Profiler, DeepProfiler, and CellProfiler serves as a compelling exemplar, highlighting X-Profiler’s preeminent predictive accuracy in determining IC50 values (Fig. 3d). Utilizing the calculated feature distances, we inferred the IC50 values of the compounds, thereby demonstrating that X-Profiler has great potential to replace traditional IC50 detection methods.

Multiple cell slices enable accurate grasp of the compound property

During the feature extraction phase employed by X-Profiler, we obtained a class embedding for a patch of images, referred to as ‘multiple cells features’ (cls embedding), along with the embedding for each individual image in the patch, termed ‘single cells features’. Our research conducted separate analyses on multiple cells and single cells image features in compound classification tasks. Additionally, we compared the landscape of X-Profiler feature with DeepProfiler and CellProfiler. Herein, the multiple cells features aggregate the mean of 100 single cell slice attributes, whereas the single cells features are a stochastic selection from the entirety of single cell slices (Fig. 4a–f). We selected the first ten compounds (alphabetically sorted) from a set of 100 compounds for the t-SNE plot illustration and presented compound distance in two-dimensional space.

Within our results, we observed that CellProfiler lacks the capability to differentiate changes in cellular phenotype between compounds, regardless of whether they form single cells features clusters or multiple cells features clusters (Fig. 4e, f). Additionally, the single cells feature clusters generated by DeepProfiler also failed to distinguish between compounds. (Fig. 4d). The distance between compounds features from X-Profiler and DeepProfiler is more explicit in the landscape of multiple cell features than in that of single ones. For instance, the presence of benzethonium chloride and betrixaban compounds is evident within the single cells feature landscape as a cluster (Fig. 4a), while in landscape of multiple cells, they evolve to form discrete clusters (Fig. 4b). This suggests that integrating a greater number of image slices can yield more comprehensive features to accurately predict compound properties. Furthermore, in comparison to X-Profiler, the multiple cells features of DeepProfiler merely attain the level of single-cell clustering as exhibited by X-Profiler (Fig. 4c). We believe that the deployment of convolutional networks within X-Profiler and DeepProfiler might engender more efficacious features for the analysis of cell painting data than traditional statistical algorithms such as CellProfiler, notwithstanding the limited interpretability inherent to these features. Nevertheless, X-Profiler exhibits superior predictive capacity in contrast to DeepProfiler. The incorporation of an attention mechanism within X-Profiler augments its cellular feature aggregation, outperforming DeepProfiler which only calculates the mean of individual cell features. This underscores the importance of leveraging transformer architecture to enhance feature extraction capabilities in intricate and unstructured data.

Our study conducted a comparative examination of single cell-slice image from four compounds (specifically bedaquiline, belzutifan, carvedilol, and daclatasvir) as selected from Fig. 4a. The landscape representation of t-SNE revealed that the cellular embeddings of the drugs belzutifan, carvedilol, and daclatasvir formed a tight cluster, markedly distinct from bedaquiline. Concurrently, the cellular images of the compounds belzutifan, carvedilol, and daclatasvir appeared indistinguishable upon visual inspection (Fig. 4g), and displayed comparable intensity in the computation of Reactive Oxygen Species (ROS) and Mitochondrial Membrane Potential (MMP) under varying concentrations, a characteristic not shared with the compound bedaquiline (Fig. 4h).

In an endeavor to comprehend the results more profoundly, we researched the mechanism of action (MOA) of the three compounds. Despite diverging therapeutic indications - belzutifan for Von Hippel-Lindau (VHL) syndrome-associated clear-cell renal cell carcinoma (ccRCC), carvedilol for hypertension, and daclatasvir for Chronic hepatitis C genotype 3 (GT-3 HCV) - their MOAs suggest that they may exert similar effects on cardiomyocytes. Belzutifan inhibits hypoxia-inducible factor 2α (HIF-2α), and research suggests that HIF-2α inhibitors can reverse pulmonary hypertension and that there are shared pathophysiologic mechanisms between cancer and heart failure^40,41. Carvedilol, a nonselective beta-adrenergic antagonist, has anti-free radical and antioxidant effects, and can resist oxidation and reduce ROS production⁴². It further impedes the cardiac Mitochondrial Permeability Transition (MPT), which can depolarize mitochondrial membranes and uncouple oxidative phosphorylation (OXPHOS)⁴². Daclatasvir, a pangenotypic NS5A replication complex inhibitor with a dual antiviral effect, inhibits RNA replication and viral assembly. Cellular ROS levels rise during HCV infection, and evidence suggests that anti-RNA viral drugs are associated with intracellular ROS levels^43,44. Given that cell phenotype reflects the proper classification of drug mechanisms, the application of deep learning holds substantial promise for future endeavors in drug discovery and therapeutic indication investigations.

X-Profiler can act as an effective cell-slice filter

To showcase the capability of X-Profiler in distilling the most salient single-cell images, we generated an attention heatmap using X-Profiler, which highlighted the cells pertinent to the task throughout the entirety of the image concerning the cardiotoxicity task (Supplementary Fig. 1a and Supplementary Fig. 1b). As indicated by the self-attention mechanism, the similarity between the embeddings of single cell-slice image and the embeddings of the CLS token is positively associated with the classification task⁴⁵. In the heatmap, cells that are more relevant to the downstream task are represented by brighter and warmer pink dots, and these dots clearly illustrate the differences between different cells in terms of task correlation. In assessing the Transformer’s capacity to segregate specific cell slices, we analyzed the attributes of task-relevant single-cell slice images alongside those deemed irrelevant. We presented the landscape of relevant and irrelevant single cells image features in Fig. 5a, in which the dots symbolize task-related images, while inverted triangles represent those unrelated to the task. The single cells image features unrelated to the downstream tasks were indistinguishable among multiple compounds, while the single cells image features related to tasks were separately clustered and widely spaced apart in accordance with their respective compounds. This demonstrates that the multiple cell slices encoder can operate as a cell filter. Two ancillary observations emerged from this clustering exercise: (1) In cellular experiments involving various compounds with distinct properties, there are often both dead cells and unaffected normal cells that form a common group and do not contribute to classification. After passing through the single-cell slice encoder, the features of these cells are similar; (2) Upon exposure to different compounds, the cell phenotypes undergo more significant changes, making them markedly distinct from each other.

To further demonstrate the screening capabilities of X-Profiler for cell phenotype, we compared cell-slice images that were considered relevant and irrelevant based on the X-Profiler attention map (Fig. 5b). Each cell-slice image was comprised of three channels: nucleus (DNA), mitochondrial membrane potential (MMP) and mitochondrial reactive oxygen species (ROS). Task relevant images exhibited more accurate staining, higher fluorescence intensity, greater contrast between channels compared to the task irrelevant images. Moreover, relevant images exhibited more distinct cell structures in the ROS and MMP channels, leading to more accurate cell identification. In terms of image quality, task relevant images were lucid and free of impurities, in stark contrast to the irrelevant images, which exhibited a halo on the image surface caused by optical structural artifacts, resulting in unclear images and high noise levels.

Cell phenotype images are distinct from general images in two key ways. Primarily, they contain a high degree of redundancy, with dark backgrounds and bright cells dominating most of the image. As a result, it can be challenging to distinguish differences between cells using generic training. Secondly, cell phenotype images are often besieged with noise, resulting from experimental manipulations and batch-to-batch variability, necessitating varied concentration gradients and experimental duplicates to mitigate stochastic influences. Prevalent models and processing techniques have struggled to address these challenges, often yielding poor results. Consequently, the attention mechanism employed in X-Profiler enables the extraction of highly informative, accurate, and structurally intact single-cell slice images, while filtering out low-informative, erroneous, and poor-quality images. This selective extraction has precipitated an increase in the model’s prowess following image preprocessing.

Thanks to the advancement of current high-throughput and high-content techniques, enormous biological data are generated every moment to help us gain better understanding of lifes. However, significant noise accompanies the true signal in these data that are generated. For example, cell painting technique allows us to stain various organelles simultaneously, but the capability to use these images to establish biologically relevant models is largely affected by the algorithm we use to distinguish the information from the noise. Current methods, in general, treat every slice of the data, such as each single cell image, equally, and have their limitations in fully utilizing the high content data. Here we propose a deep learning model that has the transformer architecture at its core to effectively assign different weights on different pieces of data. By doing this, we are able to build models that pay more attention to high quality data and less attention to low quality data, which equivalently suppress the affection of noise. The results in this stuty showed that by putting different attentions on different pieces of data, our models achieve better performances in the earlier parts of this study. It is also noted the same approach can be applied to more broader areas to improve the performance of current approaches.

Model parameter selection

Numerous parameters can influence the performance of a model, such as epochs, training batches, learning rate, optimizers, and transformer attention layers, to name a few. In this segment of our research, we evaluated a selection of critical parameters, excluding subjective human expertise. Initially, we explored the variable of patch numbers, a key element of our study, by conducting trials during the training and validation phases with patch numbers ranging from 8 to 12, and during the prediction phase with a broader range from 10 to 750. Our findings indicated that the optimal patch number for training and validation was 8, while during the prediction phase, 100 proved to be most effective. This was evidenced by the superior performance metrics, including accuracy, area under the curve (AUC), and recall, which peaked when the training and validation patch number were set to 8 and demonstrated a plateau in improvement as the prediction patch number reached 100 (Fig. 6a). Regarding the consumption of computing resources, the quantity of patches used during the training and validation phases exhibited no tangible impact on the duration of processing or GPU utilization. Nevertheless, a discernible improvement in resource usage was correlated with an increase in the number of patches designated for the prediction phase (Fig. 6b and c). Additionally, our study encompassed trials concerning the batch size and the attention layers. The outcomes revealed that the batch size bore no significant consequence on the model’s performance and the optimal configuration for the attention layers was determined to be three (Supplementary Table 3).

Conclusion and future directions

Compared to DeepProfiler and CellProfiler, X-Profiler is a novel HCA process that leverages both our understanding of cellular images and advanced deep learning algorithms. It excels in screening and utilizing informative and high-quality cell-slice images to extract task-relevant cellular features, resulting in superior performance in three drug property-related tasks: (1) determination of drug inhibition on hERG ion channels, (2) prediction of drug-induced mitochondrial toxicity, and (3) classification of compounds. In addition, X-Profiler can concatenate different compounds based on similar cell phenotypes, highlighting its potential in the field of drug repurposing and indication expansion.

In conclusion, HCS is an emerging field that is still rapidly evolving in terms of experimental implementation and analytical methods and has the potential to address diverse biological problems. The key to achieving credible results in downstream tasks lies in the ability of analytical tools or computational models to select useful parts from a large number of cell images of varying quality. Among the tested solutions for downstream tasks, CellProfiler and DeepProfiler performed mediocrely, while X-Profiler showed promising results. This is because X-Profiler, which leverages deep learning to use multiple cell slices for model training and verification simultaneously, can accurately capture key information and eliminate noise interference. Additionally, X-Profiler can distinguish between cells with similar phenotypes but perturbed by different compounds, suggesting many possibilities for downstream applications. Overall, the conclusion emphasizes the potential of X-Profiler as a valuable tool for high-content cell image analysis, which can contribute to various fields such as drug discovery and development, disease diagnosis, and personalized medicine.

In the next phase of our research, we plan to expand the application of X-Profiler to more cell phenotyping tasks and high-content images induced by other perturbations, such as RNA interference (RNAi) or CRISPR Cas9. Our goal is to demonstrate the versatility and effectiveness of our model in analyzing various types of cellular image data. However, we also acknowledge that algorithms based on cell phenotype images have limitations, particularly when the changes in cell phenotype induced by a compound are minimal, which may make it difficult for the model to accurately assess its toxicity and other characteristics. To address this issue, we intend to explore the integration of other high-level information sources, such as transcriptome data or videos of cells captured under brightfield conditions, using multimodal learning techniques. We believe that incorporating multiple sources of information will enhance the accuracy and robustness of our model, enabling it to tackle a wider array of biological problems.

Methods

Constructing the cardiotoxicity high-content image data

The hiPSC-CMs at day 30 after cardiac induction were cryopreserved as Cauliscell hiPSC-CMs (Cauliscell Company, Nanjing, China) and thawed in a 37 °C water bath with gentle shaking. After centrifugation and counting, the cells were added to 384-well plates precoated with 10 µg/ml recombinant human vitronectin at 12,000 cells/well (Cauliscell) in cardiomyocyte plating medium (Cauliscell). After 24 h, the volume was replaced with cardiomyocyte maintenance medium, which was changed every other day. Once the cells started to beat rhythmically, we added the compounds, which were selected from the FDA-approved compound library (FDA-approved drug library, MedChemExpress) for cardiotoxicity testing.

To obtain an appropriate image dataset for the hERG inhibition task, we selected 100 compounds for HCS experiments and generated corresponding high-content images using a high-content imager. We first collected all compounds with hERG IC50 values using the python API interface of the chEMBL Database and subsequently selected 100 compounds by deduplicating and filtering drug molecules with ambiguous hERG IC50 values which was determined by using the patch clamp technique expermential assay⁴⁶. The compounds were then labeled as cardiotoxic and non-cardiotoxic based on their hERG IC50 values using a threshold of 10 µM (less than or equal to 10 µM for toxic labeling and greater than 10 µM for nontoxic labeling). We provides detailed information for each compound, including plate name, drug name, hERG IC50 value range, and cardiotoxicity label and showed in Supplementary Table 2. This resulted in 47 cardiotoxic and 53 non-cardiotoxic compounds, There were 23 drugs with cardiotoxicity had the acitivity of inhibition of hERG and targed to gene potassium voltage-gated channel subfamily h member 2 (KCNH2).

The selected compound was added into each well of 384-well plates and the working concentrations of each compound were 10 µM, 3.33 µM, 1.11 µM, 0.37 µM, 0.12 µM, and 0.04 µM, with 3 replicate instances set for each compound at each concentration. The control group was set with 0.1% DMSO (Sigma), and the blank control was set with water. After incubating the cells with compounds for 72 h, working concentration dyes of CM-H2DCFDA (Thermo) at 5 µM, TMRM (Thermo) at 20 nM, and Hoechst33342 (Thermo) at 5 µg/ml were prepared with maintenance medium (Cauliscell) and added to the cells for 30 min. The cells were washed with HBSS (Beyotime Biotechnology), and YOYO-3 (Thermo) was added at a final concentration of 2 nM, followed by incubation at room temperature for 10 min and washing with HBSS (Beyotime Biotechnology). Finally, the 384-well plate (PerkinElmer) was placed on a high-content imager (Operetta CLS, PerkinElmer), and images were taken with a 20x water objective in the confocal model with 9 fields of view for one well, corresponding to the filter parameters shown in Supplementary Table 1.

Constructing the mitochondrial toxicity high-content image data

For the mitochondrial toxicity task, we utilized open-source high-content images from Bray et al. [20]. A total of fifty-five plate high-content images with numerous compounds were downloaded, and the mitochondrial toxicity labels, obtained from the PubChem Database (https://pubchem.ncbi.nlm.nih.gov/assay/pcget.cgi?query=download&record_type=datatable&actvty=all&response_type=save&aid=720637), were assigned to each compound, and the assay is qHTS assay for small molecule disruptors of the mitochondrial membrane potential from tox21 program. The history of development of the Tox21 “10K” chemical library and data workflows implemented to ensure quality chemical annotations and allow for various reproducibility assessments are described.

CellProfiler and DeeProfiler data processing

For the cardiotoxicity and mitochondrial toxicity high-content images, we extracted cell phenotype features using standard procedures of CellProfiler (version 4.2.4). Subsequently, we utilized the machine learning model called LGBMClassifier from scikit-learn to perform classification tasks on toxicity data, after dealing with cell phenotype features. Once the locations of single cells were identified in the images by CellProfiler, we inputted the images and locations to DeepProfiler (version 0.3.1). During classification tasks, we employed a pretrained model named EfficientNet, which was deployed within DeepProfiler, and calculated embeddings using DeepProfiler. The CellProfiler-LightGBM and DeepProfiler results were evaluated and compared with X-Profiler to assess their accuracy and generalizability. In all three models (X-Profiler, DeepProfiler and CellProfiler-LightGBM), we partitioned the 285 compounds labeled with mitochondrial toxicity into a training set comprising 190 compounds and a test set containing 95 compounds. The 100 cardiotoxic compounds, obtained from the chEMBL database and labeled with hERG IC50, were divided into training and testing sets, and containing 68 and 32 compounds, respectively.

ROS and MMP measurement

To measure ROS and MMP, we utilized cell phenotype features calculated from our high-content images of 100 compounds, which were used for cardiotoxicity assessment. We designed two CellProfiler analysis protocols that can automatically detect and quantify fluorescence intensity, which proved to be useful for analyzing large image datasets. The ROS and MMP measurements were obtained from the suppressed fluorescent channels of the H2DCFDA and TMRM cell-based assay kit. We computed a reduced dataset with the well-mean feature vector per well, followed by normalizing all features by subtracting the mean of each plate layout from each feature.

X-Profiler model design

X-Profiler is a plug-and-play flexible model framework that is not restricted to a particular backbone. The model processes cell-slice images by normalizing and grouping them by compound and concentration. First, single cell locations are extracted from CellProfiler by calculating the center coordinate of the nucleus. Second, images are rescaled with the global fluorescence intensity, and third, DeepProfiler crops cell slices from 3 channels based on the x and y coordinates for the center of a single nucleus. In all experiments, cell slices were cropped from a region of 96 × 96 pixels centered on the nucleus without resizing. The resulting 96 × 288-pixel images were preserved for model construction.

Cell-slice images stained with different dyes were integrated as different channels into a single cell-slice multichannel image. This approach reduces noise interference and allows the model to perform the classification task using a few representative cells. Furthermore, the training phase inputs a limited number of cell images at a time to prevent the model from overfitting prematurely, which brought by complex information of too many cell images.

The X-Profiler model consists of three main parts: a single cell-slice encoder, a multiple cell slices encoder, and a classifier. The single cell-slice encoder uses a uniform CNN network (ResNet18) to obtain an unbiased latent representation of single cell-slice image and improve the generalization of the latent representation. The multiple cell slices encoder is built by Transformer, which has strong contextualization capabilities to integrate information from each element in the sequence. A CLS token is added to the top of the sequence to filter out irrelevant cell representations and improve model robustness. The self-attentive mechanism of Transformer is used to fuse the latent representations corresponding to the CLS token, which are then fed into the classifier constructed by a linear layer for classification. The overall flow of prediction is illustrated in Fig. 1.

X-Profiler training and inference

During the training process, we used the standard supervised training method with cross-entropy as the loss function, along with learning rate warm-up and cosine decay techniques. In order to account for various experimental and computational factors, as well as potential inaccuracies in the labels themselves, we also employed label smoothing to reduce the model’s confidence.

To enhance the robustness of our model, we applied random cropping during training. Single-cell images were first resized to a resolution of 128 × 128 pixels, and then randomly cropped to 96 × 96 pixels. Additionally, we utilized TrivialAugment⁴⁷, which randomly selects one augmentation method from a diverse set of image augmentation techniques for each sample. This approach helps in improving the generalization capability of the model. During the testing phase, we used center cropping instead of random cropping and omitted the TrivialAugment step to ensure consistency in evaluation.

The optimized X-Profiler models were trained for up to 50 epochs. To prevent overfitting, we employed dropout and AdamW regularization methods. Model training was implemented using PyTorch (1.12.0 + cu116). A workstation with an Intel(R) Xeon(R) Gold 5320 (104 core processors, 2.2 GHz) CPU and two NVIDIA A100 graphics-processing units was used for model training.

We used the AdamW optimizer with a weight decay of 0.05. The batch size was set to 48. The transformer model consisted of 2 layers with 8 heads, and the hidden state dimension was 512. We applied label smoothing with a factor of 0.1 and used a learning rate warm-up for the first 5 epochs. These parameters were consistently applied across different tasks.

We observed that increasing the number of patches generally improves performance but also significantly increases computational costs. Moreover, excessively large patch numbers can lead to overfitting. Through extensive testing with various settings (2, 4, 8, 12, 16, 32), we found that using 8 patches provided the best balance between model performance and computational efficiency, as validated on the test set.

In all our tests, we employed five-fold cross-validation, using the average result of these five experiments as the final outcome. This approach ensures robustness and reliability in our performance evaluation.

In the testing phase, we made a slight modification to the training approach. While in training we randomly selected 8 cell-slice images to form multiple cell-slices sequences, in testing we increased the number of cell slices included in each sequence, the number of each sequence were 12, 50, 100, 150, 200, 250, 300, 350, 400 450, 500, 550, 600, 650, 700, 750, respectively. In order to take into account the performance of the model and the consumption of the calculation at the same time, we randomly sampled 100 sets of multiple cell slices and computed the average classification result from those sequences as the final outcome. This approach offers the advantage of improving the chances of selecting valuable cell-slice images while also reducing computational costs by not evaluating all individual cells.

X-Profiler attention map

As we utilized the CLS token embeddings generated by Transformer for classification purposes, we obtained the attention map by calculating the dot product between the query representation and key representation among the tokens in Transformer. By applying the Softmax function, each element of the attention map was assigned a value ranging from 0 to 1.

Cluster and statistical analysis

To evaluate the performance of feature extraction methods for cell-slice image analysis, we employed the t-SNE algorithm in the python sklearn package to reduce the dimensionality of features or embeddings computed by X-Profiler, CellProfiler, and DeepProfiler. We then examined the resulting component distributions to investigate the relationship between X-Profiler and the cell phenotype observed in images, as well as the effectiveness of X-Profiler compared to CellProfiler or DeepProfiler in capturing useful information. To assess the clustering performance, we used the silhouette score and adjusted random score method in sklearn to calculate the silhouette coefficient. We applied the Kruskal‒Wallis test to test the null hypothesis that the MMP or ROS intensity of images treated with compound at six different concentrations (0.04 µM, 0.12 µM, 0.37 µM, 0.11 µM, 3.33 µM, and 10 µM) were equal, setting significance at p < 0.05.

Data availability

Data is provided within the manuscript or supplementary information files.

References

Ashburn, T. T. & Thor, K. B. Drug repositioning: Identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673–683 (2004).
Article PubMed CAS Google Scholar
Ni, J. et al. Obstacles and opportunities in Chinese pharmaceutical innovation. Globalization Health 13, 1–9 (2017).
Article MATH Google Scholar
Perakakis, N., Yazdani, A., Karniadakis, G. E. & Mantzoros, C. Vol. 87 A1–A9 (Elsevier, (2018).
Paananen, J. & Fortino, V. An omics perspective on drug target discovery platforms. Brief. Bioinform. 21, 1937–1953. https://doi.org/10.1093/bib/bbz122 (2019).
Article PubMed Central MATH CAS Google Scholar
Lusher, S. J., McGuire, R., van Schaik, R. C., Nicholson, C. D. & de Vlieg, J. Data-driven medicinal chemistry in the era of big data. Drug Discov. Today 19, 859–868 (2014).
Article PubMed CAS Google Scholar
Mehonic, A. et al. Memristors—From in-memory computing, deep learning acceleration, and spiking neural networks to the future of neuromorphic and bio‐inspired computing. Adv. Intell. Syst. 2, 2000085 (2020).
Article Google Scholar
Chen, H. Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms. J. Am. Soc. Inform. Sci. 46, 194–216 (1995).
Article MATH Google Scholar
Tian, G., Harrison, P. J., Sreenivasan, A. P., Carreras-Puigvert, J. & Spjuth, O. Combining molecular and cell painting image data for mechanism of action prediction. Artif. Intell. Life Sci. 3, 100060 (2023).
CAS Google Scholar
Bickle, M. The beautiful cell: High-content screening in drug discovery. Anal. Bioanal. Chem. 398, 219–226 (2010).
Article PubMed MATH CAS Google Scholar
Perlman, Z. E. et al. Multidimensional drug profiling by automated microscopy. Science 306, 1194–1198 (2004).
Article ADS PubMed MATH CAS Google Scholar
Snijder, B. & Pelkmans, L. Origins of regulated cell-to-cell variability. Nat. Rev. Mol. Cell Biol. 12, 119–125 (2011).
Article PubMed CAS Google Scholar
Ljosa, V. et al. Comparison of methods for image-based profiling of cellular morphological responses to small-molecule treatment. J. BioMol. Screen. 18, 1321–1329 (2013).
Article PubMed MATH CAS Google Scholar
Young, D. W. et al. Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol. 4, 59–68 (2008).
Article PubMed MATH CAS Google Scholar
Sundaramurthy, V. et al. Integration of chemical and RNAi multiparametric profiles identifies triggers of intracellular mycobacterial killing. Cell. host Microbe. 13, 129–142 (2013).
Article PubMed MATH CAS Google Scholar
Akbarzadeh, M. et al. Morphological profiling by means of the cell painting assay enables identification of tubulin-targeting compounds. Cell. Chem. Biol. 29, 1053–1064. e1053 (2022).
Article PubMed MATH CAS Google Scholar
Castoreno, A. B. et al. Small molecules discovered in a pathway screen target the rho pathway in cytokinesis. Nat. Chem. Biol. 6, 457–463 (2010).
Article PubMed PubMed Central CAS Google Scholar
Lejal, V., Cerisier, N., Rouquié, D. & Taboureau, O. Assessment of drug-induced liver injury through cell morphology and gene expression analysis. Chem. Res. Toxicol. 36, 1456–1470 (2023).
Article PubMed PubMed Central CAS Google Scholar
Loo, L. H. et al. An approach for extensibly profiling the molecular states of cellular subpopulations. Nat. Methods 6, 759–765 (2009).
Article PubMed PubMed Central MATH CAS Google Scholar
Feldman, D. et al. Pooled genetic perturbation screens with image-based phenotypes. Nat. Protoc. 17, 476–512 (2022).
Article PubMed PubMed Central MATH CAS Google Scholar
Feldman, D. et al. Optical pooled screens in human cells. Cell 179, 787–799 (2019).
Article PubMed PubMed Central MATH CAS Google Scholar
Funk, L. et al. The phenotypic landscape of essential human genes. Cell 185, 4634–4653 (2022).
Article PubMed PubMed Central MATH CAS Google Scholar
Liberali, P., Snijder, B. & Pelkmans, L. A hierarchical map of regulatory genetic interactions in membrane trafficking. Cell 157, 1473–1487 (2014).
Article PubMed CAS Google Scholar
Fischer, B. et al. A map of directional genetic interactions in a metazoan cell. Elife 4, e05464 (2015).
Article PubMed PubMed Central MATH Google Scholar
Yin, Z. et al. A screen for morphological complexity identifies regulators of switch-like transitions between discrete cell shapes. Nat. Cell Biol. 15, 860–871 (2013).
Article PubMed PubMed Central MATH CAS Google Scholar
Chandrasekaran, S. N. et al. JUMP cell painting dataset: Morphological impact of 136,000 chemical and genetic perturbations. bioRxiv, 2003. 2023.534023 (2023). (2023).
Stirling, D. R. et al. CellProfiler 4: Improvements in speed, utility and usability. BMC Bioinform. 22, 1–11 (2021).
Article MATH Google Scholar
Dao, D. et al. CellProfiler analyst: Interactive data exploration, analysis and classification of large biological image sets. Bioinformatics 32, 3210–3212 (2016).
Article PubMed PubMed Central MATH CAS Google Scholar
Rocha, L., Velho, L. & Carvalho, P. C. P. In Proceedings. XV Brazilian Symposium on Computer Graphics and Image Processing. 99–105 (IEEE).
Haralick, R. M., Shanmugam, K. & Dinstein, I. H. Textural features for image classification. IEEE Trans. Syst. man. Cybernetics, 610–621 (1973).
Moshkov, N. et al. Learning representations for image-based profiling of perturbations. Biorxiv, 2022.2008.503783 (2022). (2012).
Alber, M. et al. Integrating machine learning and multiscale modeling—Perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit. Med. 2, 115 (2019).
Article PubMed PubMed Central MATH Google Scholar
Loo, L. H., Wu, L. F. & Altschuler, S. J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).
Article PubMed MATH CAS Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30 (2017).
Richard, A. M. et al. The Tox21 10K compound library: Collaborative chemistry advancing toxicology. Chem. Res. Toxicol. 34, 189–216 (2020).
Article PubMed PubMed Central MATH Google Scholar
Begriche, K., Massart, J., Robin, M. A., Borgne-Sanchez, A. & Fromenty, B. Drug-induced toxicity on mitochondria and lipid metabolism: Mechanistic diversity and deleterious consequences for the liver. J. Hepatol. 54, 773–794 (2011).
Article PubMed CAS Google Scholar
Hargreaves, I. P., Al Shahrani, M., Wainwright, L. & Heales, S. J. Drug-induced mitochondrial toxicity. Drug Saf. 39, 661–674 (2016).
Article PubMed CAS Google Scholar
Tang, X., Wang, Z., Hu, S. & Zhou, B. Assessing drug-induced mitochondrial toxicity in cardiomyocytes: Implications for preclinical cardiac safety evaluation. Pharmaceutics 14, 1313 (2022).
Article PubMed PubMed Central MATH CAS Google Scholar
Haghighi, M., Caicedo, J. C., Cimini, B. A., Carpenter, A. E. & Singh, S. High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbations. Nat. Methods 19, 1550–1557 (2022).
Article PubMed PubMed Central CAS Google Scholar
Meyer, J. N., Hartman, J. H. & Mello, D. F. Mitochondrial toxicity. Toxicol. Sci. 162, 15–23. https://doi.org/10.1093/toxsci/kfy008 (2018).
Article PubMed PubMed Central CAS Google Scholar
de Wit, S., Glen, C., de Boer, R. A. & Lang, N. N. Mechanisms shared between cancer, heart failure, and targeted anti-cancer therapies. Cardiovasc.. Res. 118, 3451–3466 (2022).
Article Google Scholar
Niemann, B. et al. Caloric restriction reduces sympathetic activity similar to beta-blockers but conveys additional mitochondrio-protective effects in aged myocardium. Sci. Rep. 11, 1931 (2021).
Article PubMed PubMed Central MATH CAS Google Scholar
Pereira, C. Drug-induced cardiac mitochondrial toxicity and protection: From doxorubicin to carvedilol. Curr. Pharm. Des. 17, 2113–2129 (2011).
Article MATH CAS Google Scholar
Lozano-Sepúlveda, S. A., Rincón-Sanchez, A. R. & Rivas-Estilla, A. M. Antioxidants benefits in hepatitis C infection in the new DAAs era. Ann. Hepatol. 18, 410–415 (2019).
Article PubMed Google Scholar
Reshi, M. L., Su, Y. C. & Hong, J. R. RNA viruses: ROS-mediated cell death. Int. J. cell. Biol. 2014 (2014).
Ghosh, M. C. et al. Therapeutic inhibition of HIF-2α reverses polycythemia and pulmonary hypertension in murine models of human diseases. Blood J. Am. Soc. Hematol. 137, 2509–2519 (2021).
MATH CAS Google Scholar
Molleman, A. Patch Clamping: An Introductory Guide to Patch Clamp Electrophysiology (Wiley, 2003).
Google Scholar
Müller, S. G. & Hutter, F. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 774–782.

Download references

Author information

Authors and Affiliations

XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
Xiangrui Gao, Fan Zhang, Xueyu Guo, Mengcheng Yao, Xiaoxiao Wang, Dong Chen, Genwei Zhang, Xiaodong Wang & Lipeng Lai

Authors

Xiangrui Gao
View author publications
Search author on:PubMed Google Scholar
Fan Zhang
View author publications
Search author on:PubMed Google Scholar
Xueyu Guo
View author publications
Search author on:PubMed Google Scholar
Mengcheng Yao
View author publications
Search author on:PubMed Google Scholar
Xiaoxiao Wang
View author publications
Search author on:PubMed Google Scholar
Dong Chen
View author publications
Search author on:PubMed Google Scholar
Genwei Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaodong Wang
View author publications
Search author on:PubMed Google Scholar
Lipeng Lai
View author publications
Search author on:PubMed Google Scholar

Contributions

Xiangrui Gao, Xiaodong Wang and Lipeng Lai conceived this research. Fan Zhang, Mengcheng Yao and Xueyu Guo curated the dataset. Xiangrui Gao, Fan Zhang and Xueyu Guo performed data analysis. Xiangrui Gao devised deep learning algorithms. Xiaoxiao Wang and Dong Chen conducted the HCS experiments. Xueyu Guo, Fan Zhang, Xiaoxiao Wang and Genwei Zhang wrote and modified the paper. Xiaodong Wang and Lipeng Lai supervised this work.

Corresponding authors

Correspondence to Xiaodong Wang or Lipeng Lai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gao, X., Zhang, F., Guo, X. et al. Attention-based deep learning for accurate cell image analysis. Sci Rep 15, 1265 (2025). https://doi.org/10.1038/s41598-025-85608-9

Download citation

Received: 18 June 2024
Accepted: 03 January 2025
Published: 08 January 2025
DOI: https://doi.org/10.1038/s41598-025-85608-9