Abstract
High-content analysis (HCA) holds enormous potential for drug discovery and research, but widely used methods can be cumbersome and yield inaccurate results. Noisy and redundant signals in cell images impede accurate deep learning-based image analysis. To address these issues, we introduce X-Profiler, a novel HCA method that combines cellular experiments, image processing, and deep learning modeling. X-Profiler combines the convolutional neural network and Transformer to encode high-content images, effectively filtering out noisy signals and precisely characterizing cell phenotypes. In comparative tests on drug-induced cardiotoxicity, mitochondrial toxicity classification, and compound classification, X-Profiler outperformed both DeepProfiler and CellProfiler, as two highly recognized and representative methods in this field. Our results demonstrate the utility and versatility of X-Profiler, and we anticipate its wide application in HCA for advancing drug development and disease research.
Similar content being viewed by others
Introduction
The Developing novel pharmaceutical drugs represents a substantial investment that involves significant amount of time and resources, but with low success rate. The major obstacles hindering drug development include ineffective drug activity, intractable drug toxicity, as well as marketing difficulties1,2. Medical research focuses on identifying the causes and deciphering the mechanisms related to a disease, aiming to eventually develop accurate diagnostic tools and effective treatments. In the last decades, high-throughput methods that are able to generate enormous data sets in a short amount of time and in a cost-effective manner3. The exploitation of so-called ‘big data’ will enable us to undertake research projects never previously possible but should also stimulate a re-evaluation of all our data practices. The high-throughput and affordability of current technologies such as omics and imaging, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task4. Data-driven medicinal chemistry approaches have the potential to improve decision making in drug discovery projects, providing that all researchers embrace the role of ‘data scientist’ and uncover the meaningful relationships and patterns in available data5. The success of developing deep learning models heavily relies on the availability of vast amounts of data and advanced algorithms6. Massive data helps to train the model effectively without premature overfitting, while the reasonableness and complexity of the network structure endow the model with good memory and information extraction capability7. In addition, the diversity of enormous data makes model training and prediction easier by eliminating complex preprocessing steps, such as segmentation, tracking, tracing, and spatial conversion. As a result, the task can be performed through an end-to-end prediction, simplifying the entire process8.
Cellular morphology is a potentially rich data source for interrogating biological perturbations, especially at a large scale9,10,11. For example, Cellular morphological profiling of compounds has been used to determine their mechanism of action8,12,13, identify their targets14,15, discover relationships with genes16,17, and characterize cellular heterogeneity18. Genes have been analyzed by creating profiles of cell populations in which the gene is perturbed by CRISRP and RNA interference (RNAi); these profiles have been used to represents the functional landscape of essential human genes19,20,21 and identify genetic interactions22,23, or characterize cellular heterogeneity24. Nowadays the Joint Undertaking for Morphological Profiling (JUMP)-Cell Painting Consortium released a dataset which contain images and profiles from the Cell Painting assay for over 116,750 unique compounds, over-expression of 12,602 genes, and knockout of 7,975 genes using CRISPR-Cas9 in human osteosarcoma cells (U2OS)25. One of the main tools currently used in image processing and interpretable feature extraction is CellProfiler26,27, developed by the imaging platform of Broad Institute, which designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically, a wide variety of measurements can be generated for each identified cell or subcellular compartment, including morphology28, intensity, and texture29 among others. Another feature extraction tool based on deep learning deepprofiler presented by Cimini and Carpenter et al. used weakly supervised learning (WSL) for modeling associations between images and treatments, and showed that it encodes both confounding factors and phenotypic features in the learned representation30.
Despite the obvious advantages of ‘big data’, the field has to overcome important challenges most of the analyses performed have inherent biologic and experimental errors and rely on capturing a snapshot of complex and dynamic biological systems31. Consequently, untimely sample collection due to an incorrect experimental design or simply due to randomness can lead to too much ‘noise’ that would not allow us to clearly identify inciting factors or errors3. During cellular phenotypic experiments, the compound effects on cells are random, and the properties of compounds cannot be accurately distinguished from images of one or a few cells32. Therefore, a ‘noise’ filter tool was needed to adaptively filter out irrelevant cell-slice images and accurately extract features for cell phenotype characterization. Transformers use a self-attention mechanism to weigh the importance of each token in the context of the entire input sequence. This allows the model to capture long-range dependencies and relationships between tokens33. To address the aforementioned issues, we combined the convolutional neural network with Transformer to developed X-Profiler, a deep learning-based method for processing ‘noise’ filter and downstream task related feature extraction, to extract valuable information from high-content images for drug analysis.
In this study, We conducted a comprehensive analysis of cell phenotype images, covering aspects from data processing to model training and validation. To evaluate the potential applicability of our method in high-content image processing, we have chosen three widely recognized downstream tasks in drug discovery: Task 1—determining drug inhibition on hERG (the human Ether-à-go-go-Related Gene) ion channels, Task 2—predicting drug-induced mitochondrial toxicity, and Task 3—classifying compounds. In all tasks, X-Profiler has consistently shown performance surpassing the state-of-the-art (SOTA) levels, demonstrating more accurate and effective feature extraction capabilities.
Results and disscussion
Model architecture of X-Profiler
In this study, we address the issue of high impurity in the dataset by introducing a novel deep learning framework. Instead of expecting each individual slice of the dataset to conform directly to a desired label mapping, we explicitly assign them distinct rankings and weights. Formally, we utilized cell painting data to train and verify our model, segmenting large cellular images into individual slices based on the positional information of cells and constructing intercellular associations by arbitrarily selecting portions of single slices. We introduced the concept of a “patch”—bundles of different single-cell slice images subjected to the same perturbation. A patch is derived from the images of cells obtained from a single well in a wet lab experiment, stained and processed using Cell Painting. After cell center localization, images are cropped to a size of 96 × 96 pixels centered on the cells. Due to the density of cells, some images may include multiple adjacent cells. To develop an efficient and robust method, we utilized deep learning models to autonomously extract features from these cell images. The information between cells is valuable, so we do not solely rely on segmented images containing only one cell. As a result, we obtained a large number of cell images, most of which contain a single cell and its surrounding cytoplasm, with a few containing multiple cells.
During training, we randomly sampled several cell images under the same compound treatment from these images to form a “patch.” By employing a ResNet to extract image features and utilizing a transformer to capture the relational patterns within the same patch, we can identify and exclude poor-quality images, thereby enhancing the model’s accuracy.
The training and validation processes of the model are illustrated in Fig. 1a. During these processes, 6 to 12 single-cell image slices were bundled into a patch and fed into the model in batches.
In each epoch, which is a full pass through the dataset, the model performs a forward pass to make predictions and then calculates the loss by comparing these predictions to the actual labels. The backward pass follows, where the model uses backpropagation to compute the gradient of the loss with respect to each weight. The optimizer then updates the model’s weights in an effort to minimize the loss. After each epoch, the model’s performance is evaluated on a separate validation set, which helps monitor its ability to generalize to unseen data. During the prediction phase (Fig. 1b), we performed random sampling on the images and packaged single-cell images into patches, comprising 10 to 750 units. Subsequent to this, the model engages in forward propagation and employs the output of the linear layer for the score prediction. Synchronously with the acquisition of the model’s predictions, we are also presented with a ranking for each individual cell image within the respective patch, culminating the comprehensive data processing endeavor.
X-Profiler’s performance on hERG inhibition, mitochondrial toxicity and compounds classification
We designed three tasks to evaluate the capability of X-Profiler. For the hERG inhibition task, we selected 100 compounds (47 toxic and 53 non-toxic) for HCS experiments and generated corresponding high-content images using a high-content imager. We chose 38 compounds (19 toxic and 19 non-toxic) as test dataset for the determination of drug inhibition on hERG ion channels. The high content images were employed to assess the hERG inhibition classification capabilities of the three models: X-Profiler, DeepProfiler, and CellProfiler. Our results showed that X-Profiler outperformed feature extractor DeepProfiler and CellProfiler, achieving an accuracy of 90.6% in cell images classification and the precision, recall and f1 scores for X-Profiler were 80.0%, 95.5% and 86.2%, respectively (Fig. 2a). Moreover, we showed the landscape for compound features using UMAP and two separated cluster for hERG channel inhibitors and non-hERG channel inhibitors were presented (Fig. 2b). The silhouette score (ss) and adjusted rand score (ars) of clusters between toxicity and non-toxicity compounds was 0.446 and 0.272, respectively, which indicated X-Profiler have the ability to capture the cell phenotype changes affect by hERG channel inhibitors.
For the mitochondrial toxicity task, we mapped all 100 compounds with Tox21 dataset annotation34. Out of these, 77 compounds bore additional mitochondrial toxicity labels (Supplementary Table 2). We selected a test set of 15 compounds, of which 1 were labeled ‘Active’, 3 were labeled ‘Inactive’, and 11 were labeled ‘Inconclusive’. Mitochondrial toxicity caused by certain compounds is a widespread form of organ toxicity, which can result in multiorgan damage in the heart, liver, bone, and brain35,36,37. The alterations in cell morphology, texture, and intensity caused by compounds are strongly correlated with mitochondrial toxicity38. We evaluated the classification performance of the three models. The X-Profiler model outperforms the other two models in terms of accuracy, recall, precision and f1 score, and the values for X-Profiler are as follows: 54.5%, 52.4%, 47.3% and 43.3%, as showed in Fig. 2c. The landscape for mitochondrial toxicity was showed in Fig. 2d and the silhouette score and adjusted rand score of clusters between activate, inactivate and inconclusive compounds was 0.023 and 0.083, respectively. For the suboptimal performance of Mitochondrial toxicity, there may be two reasons for this. (1) the among of labeled drugs are decreased and the classification increased, less data implies lower model robustness; (2) the effects of mitochondrial dysfunction are often confusingly cell type-specific, as is the case for the majority of known mitochondrial diseases39, the human HepG2 cells was applied for labeling, while iPSC-CMs might exhibit different toxicity reactions to the same compounds.
For the compounds classification task, it involved classifying images based on the cell response to different compounds. For the same compound, 20% of the images were used for testing, while 80% of the images were used for training. Our results showed that the X-Profiler model outperformed DeepProfiler and CellProfiler, achieving an accuracy, precision, recall and f1 score of 0.867, 0.892, 0.871 and 0.881, respectively. Apparently, X-Profiler demonstrates superior ability in discerning relationships among images (Fig. 2e and f).
The performance of the X-Profiler on three tasks. (a) The Model’s performance on hERG inhibition classification; (b) The landscape of X-Profiler feature on hERG inhibition classification task using t-SNE. (c) The Model’s performance on mitochondrial toxicity classification. (d) The landscape of X-Profiler feature on mitochondrial toxicity classification task using t-SNE. (e) The Model’s performance on compounds classification. (f) The landscape of X-Profiler feature on five random selected compounds.
Comparison of distance and angle for high-dimensional features
In our investigation, we quantified the distance and cosine value between the image high-dimensional features of various compounds at differing concentrations (X-Profiler: 512-d; DeepProfiler: 672-d; CellProfiler: 613-d). We intended to enhance our understanding of the model’s capacity to discriminate between cellular features post-pharmacological intervention. We determined the cosine similarity of compounds with DMSO, as illustrated in Fig. 3a. The cosine values spanned from − 1 to 1, indicative of an angular spectrum extending from − 180 degrees to 180 degrees. It was observed that the cosine values of toxic versus non-toxic compounds exhibited distinct angles within a high-dimensional space. Although the Mahalanobis distance of the compounds in relation to DMSO did not reveal a marked variance between toxic and non-toxic compounds, we noted that the Mahalanobis distance for varying concentrations within a single compound performed that the distance value augmented concomitant with concentration and these values were utilized to deduce the IC50 values for each compound (Fig. 3b). A comparative analysis of X-Profiler with CellProfiler and DeepProfiler is presented in Fig. 3c. The distribution of the absolute discrepancies between true and predicted IC50 values underscores X-Profiler’s superior precision, with a mean and standard deviation for all compounds being 2.66 and 2.94, respectively. In contrast, DeepProfiler and CellProfiler yielded mean values of 4.45 ± 4.84 and 3.48 ± 3.56, respectively. The comparative analysis of IC50 prediction curves for the compound carvedilol by X-Profiler, DeepProfiler, and CellProfiler serves as a compelling exemplar, highlighting X-Profiler’s preeminent predictive accuracy in determining IC50 values (Fig. 3d). Utilizing the calculated feature distances, we inferred the IC50 values of the compounds, thereby demonstrating that X-Profiler has great potential to replace traditional IC50 detection methods.
The feature distance and angle characterize a compound’s toxicity and its dose-dependency. (a, b) The feature cosine value and mahalanobis distance for compounds when compared with DMSO. The x-axis corresponds to the compounds, while the y-axis denotes the cosine value. Circular dots represent toxic compounds, and upward triangles represent non-toxic compounds. The colors green, blue, yellow, purple, brown, and red respectively correspond to compound concentrations of 0.04 µM, 0.12 µM, 0.36 µM, 1.11 µM, 3.33 µM, and 10.00 µM. (c) The frequency distribution histogram of the absolute differences between true IC50 values and their predicted counterparts. From left to right, the sequence is as follows: X-Profiler, DeepProfiler, and CellProfiler, with the predicted IC50 values derived from the mahalanobis distance calculation. (d) The fitted curves for the IC50 values of the compound carvedilol. From left to right, the sequence is as follows: X-Profiler, DeepProfiler, and CellProfiler. The x-axis represents the logarithmic values of concentration, with the green and red vertical dashed line indicating the actual IC50 value and predicted IC50 value, respectively.
Multiple cell slices enable accurate grasp of the compound property
During the feature extraction phase employed by X-Profiler, we obtained a class embedding for a patch of images, referred to as ‘multiple cells features’ (cls embedding), along with the embedding for each individual image in the patch, termed ‘single cells features’. Our research conducted separate analyses on multiple cells and single cells image features in compound classification tasks. Additionally, we compared the landscape of X-Profiler feature with DeepProfiler and CellProfiler. Herein, the multiple cells features aggregate the mean of 100 single cell slice attributes, whereas the single cells features are a stochastic selection from the entirety of single cell slices (Fig. 4a–f). We selected the first ten compounds (alphabetically sorted) from a set of 100 compounds for the t-SNE plot illustration and presented compound distance in two-dimensional space.
Within our results, we observed that CellProfiler lacks the capability to differentiate changes in cellular phenotype between compounds, regardless of whether they form single cells features clusters or multiple cells features clusters (Fig. 4e, f). Additionally, the single cells feature clusters generated by DeepProfiler also failed to distinguish between compounds. (Fig. 4d). The distance between compounds features from X-Profiler and DeepProfiler is more explicit in the landscape of multiple cell features than in that of single ones. For instance, the presence of benzethonium chloride and betrixaban compounds is evident within the single cells feature landscape as a cluster (Fig. 4a), while in landscape of multiple cells, they evolve to form discrete clusters (Fig. 4b). This suggests that integrating a greater number of image slices can yield more comprehensive features to accurately predict compound properties. Furthermore, in comparison to X-Profiler, the multiple cells features of DeepProfiler merely attain the level of single-cell clustering as exhibited by X-Profiler (Fig. 4c). We believe that the deployment of convolutional networks within X-Profiler and DeepProfiler might engender more efficacious features for the analysis of cell painting data than traditional statistical algorithms such as CellProfiler, notwithstanding the limited interpretability inherent to these features. Nevertheless, X-Profiler exhibits superior predictive capacity in contrast to DeepProfiler. The incorporation of an attention mechanism within X-Profiler augments its cellular feature aggregation, outperforming DeepProfiler which only calculates the mean of individual cell features. This underscores the importance of leveraging transformer architecture to enhance feature extraction capabilities in intricate and unstructured data.
Our study conducted a comparative examination of single cell-slice image from four compounds (specifically bedaquiline, belzutifan, carvedilol, and daclatasvir) as selected from Fig. 4a. The landscape representation of t-SNE revealed that the cellular embeddings of the drugs belzutifan, carvedilol, and daclatasvir formed a tight cluster, markedly distinct from bedaquiline. Concurrently, the cellular images of the compounds belzutifan, carvedilol, and daclatasvir appeared indistinguishable upon visual inspection (Fig. 4g), and displayed comparable intensity in the computation of Reactive Oxygen Species (ROS) and Mitochondrial Membrane Potential (MMP) under varying concentrations, a characteristic not shared with the compound bedaquiline (Fig. 4h).
Feature landscape of single cell features and multiple cell features. (a) The feature landscape of multiple cell features images using the X-Profiler model, each point represents a batch of 100 cell-slice images. (b) The feature landscape of single cell features using the X-Profiler model, each point represents a single cell-slice image. (c) The feature landscape of multiple cell features using DeepProfiler, each point represents average value of 100 single cell-slice images’ feature which were random seleted. (d) The feature landscape of single cell features using the DeepProfiler, each point represents a single cell-slice image and images were random seleted. (e) The feature landscape of multiple cell features using the CellProfiler model, each point represents average value of 100 single cell-slice images’ feature which were random seleted. (f) The feature landscape of single cell features images using the CellProfiler model, each point represents a single cell-slice image and images were random seleted. (g) The single cell-slice images of four compounds. For the four compounds (Bedaquiline, belzutifan, carvedilol and daclatasvir), each compound has one row and three columns of images. Images were taken with a 20x water objective. Scale bars = 50 mm (h) The box plot of MMP and ROS intensity of belzutifan, carvedilol, daclatasvir and betrixaban. ROS: reactive oxygen species. MMP: mitochondrial membrane potential.
In an endeavor to comprehend the results more profoundly, we researched the mechanism of action (MOA) of the three compounds. Despite diverging therapeutic indications - belzutifan for Von Hippel-Lindau (VHL) syndrome-associated clear-cell renal cell carcinoma (ccRCC), carvedilol for hypertension, and daclatasvir for Chronic hepatitis C genotype 3 (GT-3 HCV) - their MOAs suggest that they may exert similar effects on cardiomyocytes. Belzutifan inhibits hypoxia-inducible factor 2α (HIF-2α), and research suggests that HIF-2α inhibitors can reverse pulmonary hypertension and that there are shared pathophysiologic mechanisms between cancer and heart failure40,41. Carvedilol, a nonselective beta-adrenergic antagonist, has anti-free radical and antioxidant effects, and can resist oxidation and reduce ROS production42. It further impedes the cardiac Mitochondrial Permeability Transition (MPT), which can depolarize mitochondrial membranes and uncouple oxidative phosphorylation (OXPHOS)42. Daclatasvir, a pangenotypic NS5A replication complex inhibitor with a dual antiviral effect, inhibits RNA replication and viral assembly. Cellular ROS levels rise during HCV infection, and evidence suggests that anti-RNA viral drugs are associated with intracellular ROS levels43,44. Given that cell phenotype reflects the proper classification of drug mechanisms, the application of deep learning holds substantial promise for future endeavors in drug discovery and therapeutic indication investigations.
X-Profiler can act as an effective cell-slice filter
To showcase the capability of X-Profiler in distilling the most salient single-cell images, we generated an attention heatmap using X-Profiler, which highlighted the cells pertinent to the task throughout the entirety of the image concerning the cardiotoxicity task (Supplementary Fig. 1a and Supplementary Fig. 1b). As indicated by the self-attention mechanism, the similarity between the embeddings of single cell-slice image and the embeddings of the CLS token is positively associated with the classification task45. In the heatmap, cells that are more relevant to the downstream task are represented by brighter and warmer pink dots, and these dots clearly illustrate the differences between different cells in terms of task correlation. In assessing the Transformer’s capacity to segregate specific cell slices, we analyzed the attributes of task-relevant single-cell slice images alongside those deemed irrelevant. We presented the landscape of relevant and irrelevant single cells image features in Fig. 5a, in which the dots symbolize task-related images, while inverted triangles represent those unrelated to the task. The single cells image features unrelated to the downstream tasks were indistinguishable among multiple compounds, while the single cells image features related to tasks were separately clustered and widely spaced apart in accordance with their respective compounds. This demonstrates that the multiple cell slices encoder can operate as a cell filter. Two ancillary observations emerged from this clustering exercise: (1) In cellular experiments involving various compounds with distinct properties, there are often both dead cells and unaffected normal cells that form a common group and do not contribute to classification. After passing through the single-cell slice encoder, the features of these cells are similar; (2) Upon exposure to different compounds, the cell phenotypes undergo more significant changes, making them markedly distinct from each other.
To further demonstrate the screening capabilities of X-Profiler for cell phenotype, we compared cell-slice images that were considered relevant and irrelevant based on the X-Profiler attention map (Fig. 5b). Each cell-slice image was comprised of three channels: nucleus (DNA), mitochondrial membrane potential (MMP) and mitochondrial reactive oxygen species (ROS). Task relevant images exhibited more accurate staining, higher fluorescence intensity, greater contrast between channels compared to the task irrelevant images. Moreover, relevant images exhibited more distinct cell structures in the ROS and MMP channels, leading to more accurate cell identification. In terms of image quality, task relevant images were lucid and free of impurities, in stark contrast to the irrelevant images, which exhibited a halo on the image surface caused by optical structural artifacts, resulting in unclear images and high noise levels.
X-Profiler can act as a filter to identify cell slices that are relevant to a downstream classification task. (a) The landscpe map of X-Profiler embeddings of relevant (shape in dot) and irrelevant (shape in lower triangle) cell-slice images of 5 random selected compounds. (b) Task relevant (left) and irrelevant (right) images of compounds belzutifan. Images were taken with a 20x water objective. Scale bars = 50 mm.
Cell phenotype images are distinct from general images in two key ways. Primarily, they contain a high degree of redundancy, with dark backgrounds and bright cells dominating most of the image. As a result, it can be challenging to distinguish differences between cells using generic training. Secondly, cell phenotype images are often besieged with noise, resulting from experimental manipulations and batch-to-batch variability, necessitating varied concentration gradients and experimental duplicates to mitigate stochastic influences. Prevalent models and processing techniques have struggled to address these challenges, often yielding poor results. Consequently, the attention mechanism employed in X-Profiler enables the extraction of highly informative, accurate, and structurally intact single-cell slice images, while filtering out low-informative, erroneous, and poor-quality images. This selective extraction has precipitated an increase in the model’s prowess following image preprocessing.
Thanks to the advancement of current high-throughput and high-content techniques, enormous biological data are generated every moment to help us gain better understanding of lifes. However, significant noise accompanies the true signal in these data that are generated. For example, cell painting technique allows us to stain various organelles simultaneously, but the capability to use these images to establish biologically relevant models is largely affected by the algorithm we use to distinguish the information from the noise. Current methods, in general, treat every slice of the data, such as each single cell image, equally, and have their limitations in fully utilizing the high content data. Here we propose a deep learning model that has the transformer architecture at its core to effectively assign different weights on different pieces of data. By doing this, we are able to build models that pay more attention to high quality data and less attention to low quality data, which equivalently suppress the affection of noise. The results in this stuty showed that by putting different attentions on different pieces of data, our models achieve better performances in the earlier parts of this study. It is also noted the same approach can be applied to more broader areas to improve the performance of current approaches.
Model parameter selection
Numerous parameters can influence the performance of a model, such as epochs, training batches, learning rate, optimizers, and transformer attention layers, to name a few. In this segment of our research, we evaluated a selection of critical parameters, excluding subjective human expertise. Initially, we explored the variable of patch numbers, a key element of our study, by conducting trials during the training and validation phases with patch numbers ranging from 8 to 12, and during the prediction phase with a broader range from 10 to 750. Our findings indicated that the optimal patch number for training and validation was 8, while during the prediction phase, 100 proved to be most effective. This was evidenced by the superior performance metrics, including accuracy, area under the curve (AUC), and recall, which peaked when the training and validation patch number were set to 8 and demonstrated a plateau in improvement as the prediction patch number reached 100 (Fig. 6a). Regarding the consumption of computing resources, the quantity of patches used during the training and validation phases exhibited no tangible impact on the duration of processing or GPU utilization. Nevertheless, a discernible improvement in resource usage was correlated with an increase in the number of patches designated for the prediction phase (Fig. 6b and c). Additionally, our study encompassed trials concerning the batch size and the attention layers. The outcomes revealed that the batch size bore no significant consequence on the model’s performance and the optimal configuration for the attention layers was determined to be three (Supplementary Table 3).
X-Profiler performance and resource utilization. (a) The model’s performance across various patch numbers, encompassing both its training and prediction processes. (b) The computational time usage of the model across different patch numbers. (c) The GPU utilization of the model across different patch numbers.
Conclusion and future directions
Compared to DeepProfiler and CellProfiler, X-Profiler is a novel HCA process that leverages both our understanding of cellular images and advanced deep learning algorithms. It excels in screening and utilizing informative and high-quality cell-slice images to extract task-relevant cellular features, resulting in superior performance in three drug property-related tasks: (1) determination of drug inhibition on hERG ion channels, (2) prediction of drug-induced mitochondrial toxicity, and (3) classification of compounds. In addition, X-Profiler can concatenate different compounds based on similar cell phenotypes, highlighting its potential in the field of drug repurposing and indication expansion.
In conclusion, HCS is an emerging field that is still rapidly evolving in terms of experimental implementation and analytical methods and has the potential to address diverse biological problems. The key to achieving credible results in downstream tasks lies in the ability of analytical tools or computational models to select useful parts from a large number of cell images of varying quality. Among the tested solutions for downstream tasks, CellProfiler and DeepProfiler performed mediocrely, while X-Profiler showed promising results. This is because X-Profiler, which leverages deep learning to use multiple cell slices for model training and verification simultaneously, can accurately capture key information and eliminate noise interference. Additionally, X-Profiler can distinguish between cells with similar phenotypes but perturbed by different compounds, suggesting many possibilities for downstream applications. Overall, the conclusion emphasizes the potential of X-Profiler as a valuable tool for high-content cell image analysis, which can contribute to various fields such as drug discovery and development, disease diagnosis, and personalized medicine.
In the next phase of our research, we plan to expand the application of X-Profiler to more cell phenotyping tasks and high-content images induced by other perturbations, such as RNA interference (RNAi) or CRISPR Cas9. Our goal is to demonstrate the versatility and effectiveness of our model in analyzing various types of cellular image data. However, we also acknowledge that algorithms based on cell phenotype images have limitations, particularly when the changes in cell phenotype induced by a compound are minimal, which may make it difficult for the model to accurately assess its toxicity and other characteristics. To address this issue, we intend to explore the integration of other high-level information sources, such as transcriptome data or videos of cells captured under brightfield conditions, using multimodal learning techniques. We believe that incorporating multiple sources of information will enhance the accuracy and robustness of our model, enabling it to tackle a wider array of biological problems.
Methods
Constructing the cardiotoxicity high-content image data
The hiPSC-CMs at day 30 after cardiac induction were cryopreserved as Cauliscell hiPSC-CMs (Cauliscell Company, Nanjing, China) and thawed in a 37 °C water bath with gentle shaking. After centrifugation and counting, the cells were added to 384-well plates precoated with 10 µg/ml recombinant human vitronectin at 12,000 cells/well (Cauliscell) in cardiomyocyte plating medium (Cauliscell). After 24 h, the volume was replaced with cardiomyocyte maintenance medium, which was changed every other day. Once the cells started to beat rhythmically, we added the compounds, which were selected from the FDA-approved compound library (FDA-approved drug library, MedChemExpress) for cardiotoxicity testing.
To obtain an appropriate image dataset for the hERG inhibition task, we selected 100 compounds for HCS experiments and generated corresponding high-content images using a high-content imager. We first collected all compounds with hERG IC50 values using the python API interface of the chEMBL Database and subsequently selected 100 compounds by deduplicating and filtering drug molecules with ambiguous hERG IC50 values which was determined by using the patch clamp technique expermential assay46. The compounds were then labeled as cardiotoxic and non-cardiotoxic based on their hERG IC50 values using a threshold of 10 µM (less than or equal to 10 µM for toxic labeling and greater than 10 µM for nontoxic labeling). We provides detailed information for each compound, including plate name, drug name, hERG IC50 value range, and cardiotoxicity label and showed in Supplementary Table 2. This resulted in 47 cardiotoxic and 53 non-cardiotoxic compounds, There were 23 drugs with cardiotoxicity had the acitivity of inhibition of hERG and targed to gene potassium voltage-gated channel subfamily h member 2 (KCNH2).
The selected compound was added into each well of 384-well plates and the working concentrations of each compound were 10 µM, 3.33 µM, 1.11 µM, 0.37 µM, 0.12 µM, and 0.04 µM, with 3 replicate instances set for each compound at each concentration. The control group was set with 0.1% DMSO (Sigma), and the blank control was set with water. After incubating the cells with compounds for 72 h, working concentration dyes of CM-H2DCFDA (Thermo) at 5 µM, TMRM (Thermo) at 20 nM, and Hoechst33342 (Thermo) at 5 µg/ml were prepared with maintenance medium (Cauliscell) and added to the cells for 30 min. The cells were washed with HBSS (Beyotime Biotechnology), and YOYO-3 (Thermo) was added at a final concentration of 2 nM, followed by incubation at room temperature for 10 min and washing with HBSS (Beyotime Biotechnology). Finally, the 384-well plate (PerkinElmer) was placed on a high-content imager (Operetta CLS, PerkinElmer), and images were taken with a 20x water objective in the confocal model with 9 fields of view for one well, corresponding to the filter parameters shown in Supplementary Table 1.
Constructing the mitochondrial toxicity high-content image data
For the mitochondrial toxicity task, we utilized open-source high-content images from Bray et al. [20]. A total of fifty-five plate high-content images with numerous compounds were downloaded, and the mitochondrial toxicity labels, obtained from the PubChem Database (https://pubchem.ncbi.nlm.nih.gov/assay/pcget.cgi?query=download&record_type=datatable&actvty=all&response_type=save&aid=720637), were assigned to each compound, and the assay is qHTS assay for small molecule disruptors of the mitochondrial membrane potential from tox21 program. The history of development of the Tox21 “10K” chemical library and data workflows implemented to ensure quality chemical annotations and allow for various reproducibility assessments are described.
CellProfiler and DeeProfiler data processing
For the cardiotoxicity and mitochondrial toxicity high-content images, we extracted cell phenotype features using standard procedures of CellProfiler (version 4.2.4). Subsequently, we utilized the machine learning model called LGBMClassifier from scikit-learn to perform classification tasks on toxicity data, after dealing with cell phenotype features. Once the locations of single cells were identified in the images by CellProfiler, we inputted the images and locations to DeepProfiler (version 0.3.1). During classification tasks, we employed a pretrained model named EfficientNet, which was deployed within DeepProfiler, and calculated embeddings using DeepProfiler. The CellProfiler-LightGBM and DeepProfiler results were evaluated and compared with X-Profiler to assess their accuracy and generalizability. In all three models (X-Profiler, DeepProfiler and CellProfiler-LightGBM), we partitioned the 285 compounds labeled with mitochondrial toxicity into a training set comprising 190 compounds and a test set containing 95 compounds. The 100 cardiotoxic compounds, obtained from the chEMBL database and labeled with hERG IC50, were divided into training and testing sets, and containing 68 and 32 compounds, respectively.
ROS and MMP measurement
To measure ROS and MMP, we utilized cell phenotype features calculated from our high-content images of 100 compounds, which were used for cardiotoxicity assessment. We designed two CellProfiler analysis protocols that can automatically detect and quantify fluorescence intensity, which proved to be useful for analyzing large image datasets. The ROS and MMP measurements were obtained from the suppressed fluorescent channels of the H2DCFDA and TMRM cell-based assay kit. We computed a reduced dataset with the well-mean feature vector per well, followed by normalizing all features by subtracting the mean of each plate layout from each feature.
X-Profiler model design
X-Profiler is a plug-and-play flexible model framework that is not restricted to a particular backbone. The model processes cell-slice images by normalizing and grouping them by compound and concentration. First, single cell locations are extracted from CellProfiler by calculating the center coordinate of the nucleus. Second, images are rescaled with the global fluorescence intensity, and third, DeepProfiler crops cell slices from 3 channels based on the x and y coordinates for the center of a single nucleus. In all experiments, cell slices were cropped from a region of 96 × 96 pixels centered on the nucleus without resizing. The resulting 96 × 288-pixel images were preserved for model construction.
Cell-slice images stained with different dyes were integrated as different channels into a single cell-slice multichannel image. This approach reduces noise interference and allows the model to perform the classification task using a few representative cells. Furthermore, the training phase inputs a limited number of cell images at a time to prevent the model from overfitting prematurely, which brought by complex information of too many cell images.
The X-Profiler model consists of three main parts: a single cell-slice encoder, a multiple cell slices encoder, and a classifier. The single cell-slice encoder uses a uniform CNN network (ResNet18) to obtain an unbiased latent representation of single cell-slice image and improve the generalization of the latent representation. The multiple cell slices encoder is built by Transformer, which has strong contextualization capabilities to integrate information from each element in the sequence. A CLS token is added to the top of the sequence to filter out irrelevant cell representations and improve model robustness. The self-attentive mechanism of Transformer is used to fuse the latent representations corresponding to the CLS token, which are then fed into the classifier constructed by a linear layer for classification. The overall flow of prediction is illustrated in Fig. 1.
X-Profiler training and inference
During the training process, we used the standard supervised training method with cross-entropy as the loss function, along with learning rate warm-up and cosine decay techniques. In order to account for various experimental and computational factors, as well as potential inaccuracies in the labels themselves, we also employed label smoothing to reduce the model’s confidence.
To enhance the robustness of our model, we applied random cropping during training. Single-cell images were first resized to a resolution of 128 × 128 pixels, and then randomly cropped to 96 × 96 pixels. Additionally, we utilized TrivialAugment47, which randomly selects one augmentation method from a diverse set of image augmentation techniques for each sample. This approach helps in improving the generalization capability of the model. During the testing phase, we used center cropping instead of random cropping and omitted the TrivialAugment step to ensure consistency in evaluation.
The optimized X-Profiler models were trained for up to 50 epochs. To prevent overfitting, we employed dropout and AdamW regularization methods. Model training was implemented using PyTorch (1.12.0 + cu116). A workstation with an Intel(R) Xeon(R) Gold 5320 (104 core processors, 2.2 GHz) CPU and two NVIDIA A100 graphics-processing units was used for model training.
We used the AdamW optimizer with a weight decay of 0.05. The batch size was set to 48. The transformer model consisted of 2 layers with 8 heads, and the hidden state dimension was 512. We applied label smoothing with a factor of 0.1 and used a learning rate warm-up for the first 5 epochs. These parameters were consistently applied across different tasks.
We observed that increasing the number of patches generally improves performance but also significantly increases computational costs. Moreover, excessively large patch numbers can lead to overfitting. Through extensive testing with various settings (2, 4, 8, 12, 16, 32), we found that using 8 patches provided the best balance between model performance and computational efficiency, as validated on the test set.
In all our tests, we employed five-fold cross-validation, using the average result of these five experiments as the final outcome. This approach ensures robustness and reliability in our performance evaluation.
In the testing phase, we made a slight modification to the training approach. While in training we randomly selected 8 cell-slice images to form multiple cell-slices sequences, in testing we increased the number of cell slices included in each sequence, the number of each sequence were 12, 50, 100, 150, 200, 250, 300, 350, 400 450, 500, 550, 600, 650, 700, 750, respectively. In order to take into account the performance of the model and the consumption of the calculation at the same time, we randomly sampled 100 sets of multiple cell slices and computed the average classification result from those sequences as the final outcome. This approach offers the advantage of improving the chances of selecting valuable cell-slice images while also reducing computational costs by not evaluating all individual cells.
X-Profiler attention map
As we utilized the CLS token embeddings generated by Transformer for classification purposes, we obtained the attention map by calculating the dot product between the query representation and key representation among the tokens in Transformer. By applying the Softmax function, each element of the attention map was assigned a value ranging from 0 to 1.
Cluster and statistical analysis
To evaluate the performance of feature extraction methods for cell-slice image analysis, we employed the t-SNE algorithm in the python sklearn package to reduce the dimensionality of features or embeddings computed by X-Profiler, CellProfiler, and DeepProfiler. We then examined the resulting component distributions to investigate the relationship between X-Profiler and the cell phenotype observed in images, as well as the effectiveness of X-Profiler compared to CellProfiler or DeepProfiler in capturing useful information. To assess the clustering performance, we used the silhouette score and adjusted random score method in sklearn to calculate the silhouette coefficient. We applied the Kruskal‒Wallis test to test the null hypothesis that the MMP or ROS intensity of images treated with compound at six different concentrations (0.04 µM, 0.12 µM, 0.37 µM, 0.11 µM, 3.33 µM, and 10 µM) were equal, setting significance at p < 0.05.
Data availability
Data is provided within the manuscript or supplementary information files.
References
Ashburn, T. T. & Thor, K. B. Drug repositioning: Identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3, 673–683 (2004).
Ni, J. et al. Obstacles and opportunities in Chinese pharmaceutical innovation. Globalization Health 13, 1–9 (2017).
Perakakis, N., Yazdani, A., Karniadakis, G. E. & Mantzoros, C. Vol. 87 A1–A9 (Elsevier, (2018).
Paananen, J. & Fortino, V. An omics perspective on drug target discovery platforms. Brief. Bioinform. 21, 1937–1953. https://doi.org/10.1093/bib/bbz122 (2019).
Lusher, S. J., McGuire, R., van Schaik, R. C., Nicholson, C. D. & de Vlieg, J. Data-driven medicinal chemistry in the era of big data. Drug Discov. Today 19, 859–868 (2014).
Mehonic, A. et al. Memristors—From in-memory computing, deep learning acceleration, and spiking neural networks to the future of neuromorphic and bio‐inspired computing. Adv. Intell. Syst. 2, 2000085 (2020).
Chen, H. Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms. J. Am. Soc. Inform. Sci. 46, 194–216 (1995).
Tian, G., Harrison, P. J., Sreenivasan, A. P., Carreras-Puigvert, J. & Spjuth, O. Combining molecular and cell painting image data for mechanism of action prediction. Artif. Intell. Life Sci. 3, 100060 (2023).
Bickle, M. The beautiful cell: High-content screening in drug discovery. Anal. Bioanal. Chem. 398, 219–226 (2010).
Perlman, Z. E. et al. Multidimensional drug profiling by automated microscopy. Science 306, 1194–1198 (2004).
Snijder, B. & Pelkmans, L. Origins of regulated cell-to-cell variability. Nat. Rev. Mol. Cell Biol. 12, 119–125 (2011).
Ljosa, V. et al. Comparison of methods for image-based profiling of cellular morphological responses to small-molecule treatment. J. BioMol. Screen. 18, 1321–1329 (2013).
Young, D. W. et al. Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol. 4, 59–68 (2008).
Sundaramurthy, V. et al. Integration of chemical and RNAi multiparametric profiles identifies triggers of intracellular mycobacterial killing. Cell. host Microbe. 13, 129–142 (2013).
Akbarzadeh, M. et al. Morphological profiling by means of the cell painting assay enables identification of tubulin-targeting compounds. Cell. Chem. Biol. 29, 1053–1064. e1053 (2022).
Castoreno, A. B. et al. Small molecules discovered in a pathway screen target the rho pathway in cytokinesis. Nat. Chem. Biol. 6, 457–463 (2010).
Lejal, V., Cerisier, N., Rouquié, D. & Taboureau, O. Assessment of drug-induced liver injury through cell morphology and gene expression analysis. Chem. Res. Toxicol. 36, 1456–1470 (2023).
Loo, L. H. et al. An approach for extensibly profiling the molecular states of cellular subpopulations. Nat. Methods 6, 759–765 (2009).
Feldman, D. et al. Pooled genetic perturbation screens with image-based phenotypes. Nat. Protoc. 17, 476–512 (2022).
Feldman, D. et al. Optical pooled screens in human cells. Cell 179, 787–799 (2019).
Funk, L. et al. The phenotypic landscape of essential human genes. Cell 185, 4634–4653 (2022).
Liberali, P., Snijder, B. & Pelkmans, L. A hierarchical map of regulatory genetic interactions in membrane trafficking. Cell 157, 1473–1487 (2014).
Fischer, B. et al. A map of directional genetic interactions in a metazoan cell. Elife 4, e05464 (2015).
Yin, Z. et al. A screen for morphological complexity identifies regulators of switch-like transitions between discrete cell shapes. Nat. Cell Biol. 15, 860–871 (2013).
Chandrasekaran, S. N. et al. JUMP cell painting dataset: Morphological impact of 136,000 chemical and genetic perturbations. bioRxiv, 2003. 2023.534023 (2023). (2023).
Stirling, D. R. et al. CellProfiler 4: Improvements in speed, utility and usability. BMC Bioinform. 22, 1–11 (2021).
Dao, D. et al. CellProfiler analyst: Interactive data exploration, analysis and classification of large biological image sets. Bioinformatics 32, 3210–3212 (2016).
Rocha, L., Velho, L. & Carvalho, P. C. P. In Proceedings. XV Brazilian Symposium on Computer Graphics and Image Processing. 99–105 (IEEE).
Haralick, R. M., Shanmugam, K. & Dinstein, I. H. Textural features for image classification. IEEE Trans. Syst. man. Cybernetics, 610–621 (1973).
Moshkov, N. et al. Learning representations for image-based profiling of perturbations. Biorxiv, 2022.2008.503783 (2022). (2012).
Alber, M. et al. Integrating machine learning and multiscale modeling—Perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit. Med. 2, 115 (2019).
Loo, L. H., Wu, L. F. & Altschuler, S. J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).
Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30 (2017).
Richard, A. M. et al. The Tox21 10K compound library: Collaborative chemistry advancing toxicology. Chem. Res. Toxicol. 34, 189–216 (2020).
Begriche, K., Massart, J., Robin, M. A., Borgne-Sanchez, A. & Fromenty, B. Drug-induced toxicity on mitochondria and lipid metabolism: Mechanistic diversity and deleterious consequences for the liver. J. Hepatol. 54, 773–794 (2011).
Hargreaves, I. P., Al Shahrani, M., Wainwright, L. & Heales, S. J. Drug-induced mitochondrial toxicity. Drug Saf. 39, 661–674 (2016).
Tang, X., Wang, Z., Hu, S. & Zhou, B. Assessing drug-induced mitochondrial toxicity in cardiomyocytes: Implications for preclinical cardiac safety evaluation. Pharmaceutics 14, 1313 (2022).
Haghighi, M., Caicedo, J. C., Cimini, B. A., Carpenter, A. E. & Singh, S. High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbations. Nat. Methods 19, 1550–1557 (2022).
Meyer, J. N., Hartman, J. H. & Mello, D. F. Mitochondrial toxicity. Toxicol. Sci. 162, 15–23. https://doi.org/10.1093/toxsci/kfy008 (2018).
de Wit, S., Glen, C., de Boer, R. A. & Lang, N. N. Mechanisms shared between cancer, heart failure, and targeted anti-cancer therapies. Cardiovasc.. Res. 118, 3451–3466 (2022).
Niemann, B. et al. Caloric restriction reduces sympathetic activity similar to beta-blockers but conveys additional mitochondrio-protective effects in aged myocardium. Sci. Rep. 11, 1931 (2021).
Pereira, C. Drug-induced cardiac mitochondrial toxicity and protection: From doxorubicin to carvedilol. Curr. Pharm. Des. 17, 2113–2129 (2011).
Lozano-Sepúlveda, S. A., Rincón-Sanchez, A. R. & Rivas-Estilla, A. M. Antioxidants benefits in hepatitis C infection in the new DAAs era. Ann. Hepatol. 18, 410–415 (2019).
Reshi, M. L., Su, Y. C. & Hong, J. R. RNA viruses: ROS-mediated cell death. Int. J. cell. Biol. 2014 (2014).
Ghosh, M. C. et al. Therapeutic inhibition of HIF-2α reverses polycythemia and pulmonary hypertension in murine models of human diseases. Blood J. Am. Soc. Hematol. 137, 2509–2519 (2021).
Molleman, A. Patch Clamping: An Introductory Guide to Patch Clamp Electrophysiology (Wiley, 2003).
Müller, S. G. & Hutter, F. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 774–782.
Author information
Authors and Affiliations
Contributions
Xiangrui Gao, Xiaodong Wang and Lipeng Lai conceived this research. Fan Zhang, Mengcheng Yao and Xueyu Guo curated the dataset. Xiangrui Gao, Fan Zhang and Xueyu Guo performed data analysis. Xiangrui Gao devised deep learning algorithms. Xiaoxiao Wang and Dong Chen conducted the HCS experiments. Xueyu Guo, Fan Zhang, Xiaoxiao Wang and Genwei Zhang wrote and modified the paper. Xiaodong Wang and Lipeng Lai supervised this work.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gao, X., Zhang, F., Guo, X. et al. Attention-based deep learning for accurate cell image analysis. Sci Rep 15, 1265 (2025). https://doi.org/10.1038/s41598-025-85608-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-85608-9