Abstract
Bone tumors are a leading cause of morbidity and mortality in human health. The application of artificial intelligence in medical assistance has fundamentally transformed traditional, labor-intensive diagnostic methods, effectively alleviating the pressure on medical resources. However, the multi-scale nature of bone tumors in medical images, along with complex tumor boundaries and disordered textures, makes it difficult for algorithms to distinguish normal tissue from tumor tissue when relying solely on pixel-level or contextual information for segmentation. To address this, this paper proposes a bone tumor recognition strategy based on object region and context representation (RCROS), which enhances pixel-level features using object region and context representation. The RCROS strategy aggregates pixel representations from each tissue category to estimate the representation of the corresponding object region, and then calculates the relationship between each pixel and its target region. Finally, the object context representation is employed to enhance the representation of each pixel. Experiments were conducted using more than 80,000 datasets from Huaihua Second People’s Hospital. RCROS achieves high accuracy while maintaining low resource consumption. It reduces the time doctors spend viewing images and provides a more accurate reference for clinical decision-making.
Similar content being viewed by others
Introduction
Bone tumors arise in the bones or their adjacent tissues and represent one of the most common forms of malignant tumors1. Bone tumors exhibit rapid growth, poor prognosis, and high mortality rates. Statistically, the incidence of bone tumors is the highest among primary malignant tumors globally, at 3 per million1. Although the overall prevalence of bone tumors is relatively low compared to other cancers, their mortality rate remains high. Moreover, there are significant disparities in five-year survival rates across different countries. In developed countries, the overall five-year survival rate for bone tumors is 85%2. In contrast, the survival rate in developing countries is significantly lower than in developed countries, primarily due to a lack of medical resources. Furthermore, the large population base contributes to a higher total incidence of bone tumors. For instance, in China, there are currently 4,000 to 5,000 new cases of bone tumors each year3. However, the survival rate for patients is only 50% to 60%, reflecting the high malignancy of these tumors. Bone tumors pose a significant threat to human health; however, early detection and timely, systematic treatment can substantially improve the survival rates of affected patients.
Imaging currently ranks among the most important modalities for diagnosing bone tumors. The primary imaging techniques include X-ray, CT, MRI, and isotope bone scanning4. MRI is particularly effective for demonstrating the extent of bone tumor lesions and quantifying the involvement of the bone marrow cavity; thus, it is widely employed to assist in bone tumor detection. However, in many developing countries, doctors, hospitals, and governments encounter significant challenges in diagnosing and treating bone tumors due to a lack of resources and limited technological capabilities.
-
(1)
Limited medical resources are in conflict with a large population.
Medical resources are scarce in developing countries, and the health of many people cannot be effectively protected. Take China as an example. The population has reached 1.44 billion, but there are only 3.9 million doctors and 35,394 hospitals. On average, each doctor is responsible for more than 5,000 patients, and each hospital has to bear a huge number of patients. In addition, the geographical distribution of hospitals and population distribution are unbalanced. 70% of the population is distributed in rural areas, but they only have 30% of medical resources. Developed cities only have 30% of the population, but they have 70% of medical resources.
-
(2)
The contradiction between outdated medical technology and complex diagnostic processes.
Bone tumors are long and complex to diagnose. Most developing countries have weak infrastructures, poor medical equipment, and inadequate technology to make accurate diagnosis. In Chinese hospitals, each bone tumor patient generates about 600–700 generated MRI images, of which only 10–20 images are really useful for clinicians5. There is a great challenge in screening effective images from the huge amount of data.
-
(3)
Contradiction between the subjectivity of manual analysis and the complexity of the disease.
There is no clear definition for imaging analysis, and doctors can only rely on their experience, which can easily lead to misdiagnosis. If the doctor is inexperienced, it is often easy to miss or misdiagnose the disease in the early stage. A Chinese Medical Association misdiagnosis data report claims that the number of misdiagnoses in China’s clinical medical care is about 57 million people per year, and the total misdiagnosis rate is 27.8%6. Among them, the average misdiagnosis rate of malignant tumors reaches 40%.
-
(4)
Radiologists are at odds with the unbalanced growth of imaging data.
Currently, imaging data in China is increasing at an annual rate of 30%, while the number of radiologists grows at only 4.1% per year7. This 23.9% discrepancy indicates that radiologists will face significantly increasing pressure to manage the rising volume of imaging data.
AI-assisted medical imaging diagnosis and treatment decision systems have significantly alleviated healthcare challenges in developing countries by leveraging large-scale imaging data and clinical diagnostic information trained using techniques such as deep learning and natural language processing8,9. AI has greatly enhanced the ability of medical imaging to screen for diseases and make clinical diagnoses. For example, it takes about half an hour to manually analyze a bone tumor MRI image, while AI can complete the task in just a few seconds.
In addition, AI has greatly enhanced functions such as image segmentation, feature extraction, and quantitative and comparative analysis. It can also perform tasks such as automatic target region delineation, lesion identification and annotation, lesion characterization, 3D image reconstruction, and image classification and retrieval8. Image data is widely used as a diagnostic and treatment tool for a variety of medical applications, such as fundus screening, brain segmentation, brain disease diagnosis, organ delineation, bone lesion detection, bone age analysis, pathological biopsy, and dermatology diagnosis support.
However, the morphology of bone tumors in MRI scans is very complex, and current AI-assisted diagnosis methods often fail to accurately depict the lesion area. Bone tumors show significant differences in size, pixel intensity, shape, and location10. The blurred tumor boundary causes MRI images to show varying degrees of uneven noise, which makes it challenging to accurately locate the tumor boundary and extract complete and precise tumor features. In addition, the signal intensity of a single tissue (region) in an MRI image is different, and the intensity at different locations in the same tissue class is also different11. A single attention mechanism that only distinguishes pixels at different spatial locations is not sufficient to address the challenge of bone tumor lesion segmentation. In MRI images, the center of the lesion is characterized by higher signal intensity and more obvious features than the tissue boundary. Although some methods have attempted to obtain contextual information through multi-scale representation or zero convolution12,13, these methods are still limited in understanding the relationship between the tumor and surrounding tissues and it is difficult to accurately capture the complete contextual information of the tumor.
Based on this, this paper introduces a segmentation method (RCROS) that uses regional and contextual representations to effectively utilize regional and contextual information in bone tumor MRI images. To address the challenges posed by complex and disordered textures in MRI images that hinder pixel-level classification, the proposed method integrates contextual pixel information from similar and different tissue types. The segmentation network generates object regions through supervised learning, using doctors’ Ground Truth annotations to ensure model specialization. Furthermore, an attention mechanism is integrated into the segmentation network to enhance the model’s focus on the object regions. Finally, the RCROS model unifies segmentation results from the same MRI images processed in different ways, using the mean value as the final output. Each MRI image consists of two sequences, T1 and T2 weighting, which correspond to each other. This not only significantly reduces the time required for doctors to review images but also provides a more accurate reference for clinical decision-making.
The main contributions of this paper are specified below:
-
(1)
We propose a bone tumor segmentation method that improves the representation of any pixel in MRI images by integrating object region representation with top and bottom features to enhance pixel features (RCROS). This method effectively addresses challenges such as blurred tumor edges and uneven signal intensity distribution in segmentation models.
-
(2)
Introducing a dual attention mechanism: Embedding the attention mechanism into the object region and context representation enhances the model’s sensitivity to texture details in MRI images. This effectively highlights important information in a large amount of noisy and irrelevant data, thereby improving the model’s accuracy and pixel representation perception.
-
(3)
Real-time SegFix was used to optimize the model’s performance in fine segmentation of regions and gaps. The model superimposed all image copies with the original image and selected the majority prediction for each pixel as the final output. This approach significantly improved the model’s accuracy in delineating tumor boundaries.
-
(4)
More than 80,000 MRI images from Huaihua Second People’s Hospital, Hunan Province, China were used for testing. The results show that our proposed method outperforms other segmentation and bone tumor recognition models. It takes into account accuracy, number of parameters, and computational workload, making it effective in assisting doctors in bone tumor detection and playing a key role in the pre-diagnosis, treatment, and prognosis of bone tumors.
Related work
Deep learning-based image segmentation technology continues to advance, with an increasing number of network structures proposed and validated for their performance. U-Net14 established a foundation for medical image segmentation due to its simple structure and robust segmentation performance. Building on this foundation, many researchers have continued to adopt the U-shaped network structure, with Zhou et al. proposing U-Net + + in 201815. U-Net + + enhances network feature integration by adding skip connections while maintaining the U-shaped structure, and its deep supervision approach further accelerates network inference speed. Luo et al. proposed a new deep learning model, DC-UNet6, which improves the network’s ability to capture fine details by introducing Dual-Channel blocks and Res-Path connections.
However, the accuracy of CNN-based image segmentation does not meet the strict standards required in the medical field, leading to increased use of vision transformers for medical image segmentation. TransUNet16, the first Transformer-based segmentation technique for medical pictures, was proposed by Chen et al. TransUNet leverages the Transformer’s global self-attention mechanism to capture image context, while U-Net encoder skip connections restore local detail for more precise segmentation. Subsequently, Cao et al. proposed Swin-Unet17, a pure Transformer architecture for medical image segmentation. Swin-Unet retains the U-shaped structure of U-Net but replaces the original encoders with self-attention modules, demonstrating the power of Transformer models in image segmentation. Gao et al. proposed UTNet18, a hybrid Transformer architecture that combines CNN and self-attention mechanisms to address traditional CNN limitations in capturing long-range dependencies, leveraging Transformers for global information aggregation. Ji et al. proposed MCTrans19, a unified Transformer network designed to learn cross-scale dependencies and ensure semantic consistency in biomedical image segmentation. MCTrans addresses limitations in existing visual Transformer methods, such as inadequate cross-scale dependency, semantic consistency, and feature representation.
In diagnosing bone tumor diseases, computer-assisted image processing is increasingly becoming a research focus. Anisuzzaman et al.20 used CNNs to train publicly available osteosarcoma tissue image datasets. The study compares two CNN models, VGG19 and Inception V3, and finds that VGG19 achieves higher accuracy in both binary and multi-class classification tasks. Barzekar et al. proposed a unique convolutional neural network structure, C-Net21, which incorporates multiple CNNs connected in series. Nabid et al. proposed an RCNN-based deep learning model22 for identifying tumor cells, necrotic cells, and non-tumor cells in osteosarcoma histopathological images; however, the model experiences overfitting issues. Parlak et al.23 investigated diffusion-weighted imaging (DWI) and apparent diffusion coefficient (ADC) values for Ewing sarcoma (ES) and osteogenic sarcoma (OS). Boundary case segmentation was performed to accurately delineate Ewing’s sarcoma and osteogenic sarcoma.
The above analysis highlights the growing importance of image segmentation in disease diagnosis and prognosis assessment. However, in bone tumor image segmentation, retaining edge features remains challenging due to noise susceptibility and complex, disordered textures, making it difficult to distinguish between normal and tumor tissues using pixel-level or context-only information. This paper proposes an automatic MRI image segmentation system for bone tumors, leveraging object-region and contextual representation to address issues of tumor edge blurring and uneven signal intensity distribution in segmentation models.
System model
In most developing countries, a lack of medical resources and the high cost of medical imaging equipment prevent many patients from receiving timely and effective treatment. Additionally, with a large patient base and a high volume of medical images generated for diagnosis, relying solely on doctors to manually delineate tumor boundaries might easily result in incorrect and missing diagnoses. The use of artificial intelligence in medical imaging data can accelerate lesion identification time, enhance analysis and comparison capabilities, and ensure optimal treatment timing. This method improves the efficiency and accuracy of doctors’ diagnosis, reduces misdiagnosis, and minimizes patients’ time and other costs to the greatest extent possible. However, in the diagnosis of bone tumors, the complex and irregular texture of MRI images, coupled with the multi-scale nature of bone tumors, often hinder MRI based recognition from achieving expectations. To improve segmentation accuracy in bone tumor MRI images, this study proposes an object region and contextual representation-based segmentation method (RCROS) that enhances pixel features. This approach addresses the challenge of predicting the classification of single pixels in bone tumor MRI segmentation. The proposed algorithm not only accurately depicts the tumor boundary, but also provides doctors with more precise reference by comparing the results generated in three MRI image slices and two sequences. This helps to reduce the time required for film analysis and achieve more accurate treatment plans. The overall architecture of RCROS is shown in Fig. 1.
RCROS consists of two main sections. Section 3.1 covers MRI image preprocessing, including image binarization and normalization, effective training region selection, and data classification and augmentation. Section "Data Preprocessing" describes the segmentation network, which incorporates a pixel-enhancement-based segmentation method (IMEocr) and a boundary optimization algorithm. Table 1 presents various symbols utilized in this chapter, along with their definitions.
Data preprocessing
The Artificial Intelligence Medical Imaging Assisted Diagnosis System primarily identifies regions of interest in medical images, automatically or semi-automatically outlines image boundaries, and segments images into various tissue types. However, several challenges exist when segmenting MRI of bone tumors:
-
(1)
MRI quality issues are unavoidable. Variations in equipment performance often introduce noise into MRI, and extensive background regions lack relevant information, increasing model training difficulty and leading to resource inefficiencies.
-
(2)
MRI come in various types, including three distinct body sections, which exhibit significant differences in image features. Additionally, the segmentation model learns features differently across these sections. Each image contains both T1-weighted and T2-weighted parameters, and the signal intensities of individual tissues vary across sequences, leading to differing segmentation accuracies.
-
(3)
MRI data is limited and challenging to access. Due to patient privacy and confidentiality concerns, acquiring MRI is difficult, resulting in a small training dataset. This limitation leads to unstable model training and a higher risk of overfitting.
To address these issues and mitigate the influence of external factors on the bone tumor segmentation model, we performed preprocessing before MRI image segmentation. First, the bone tumor MRI were binarized and normalized, and irrelevant background regions not useful for model training were removed. Next, we employed a simplified MobileNetV2 model to classify the bone tumor MRI set into three datasets: transverse, sagittal, and coronal, based on different sections of the human body. Each MRI contains both T1-weighted and T2-weighted sequences. Finally, we expanded the dataset.
Dataset optimization
The MRI of a bone tumor comprises the tumor region, edematous tissue, normal bone tissue, muscle tissue within the bone marrow, and a blank background region, among others. As illustrated in Fig. 2, the background region constitutes a significant portion of the MRI image. Specifically, in the cross-sectional view, the area corresponding to the human body accounts for approximately 1/8 of the total image. During the segmentation training of bone tumor MRI images, numerous invalid regions consume substantial model memory. Furthermore, these invalid regions exhibit grayscale values similar to those of the tumor regions after T1-weighted imaging, complicating the model’s ability to accurately delineate the tumor regions. Consequently, we processed the MRI images as depicted in Fig. 2 through the following steps.
-
(1)
Image Binarization: Given the high signal intensity of human tissues in MRI of bone tumors, along with the darker pixel values of the invalid background, we employ a binarization algorithm to process the MRI images.
$$T_{div} = argmax\left( {\mathop \sum \limits_{i = 0}^{255} \left( {\eta_{0} *N_{{g_{val} < g_{i} }} *\left( {V - V_{{g_{val} < g_{i} }} } \right)^{2} + \eta_{1} *N_{{g_{val} \ge g_{i} }} *\left( {V - V_{{g_{val} \ge g_{i} }} } \right)^{2} } \right)} \right)$$(1)
where \(g_{i}\) represents the gray value, while \(\eta_{0} + \eta_{1} = 1\) and \(T_{div}\) are the thresholds for classifying all pixel points. \(N_{{g_{val} < g_{i} }}\) denotes the number of pixel points with gray values less than \(g_{i}\), and \(V_{{g_{val} < g_{i} }}\) signifies the average gray value of all pixel points with values below \(g_{i}\). The reverse is the same. In the original MRI image, pixel values greater than \(T_{div}\) are designated as bright points with a pixel value of 255, while pixel values less than or equal to \(T_{div}\) are assigned a value of 0, representing black points. After executing Eq. (1), the resultant image consists solely of pure black and pure white.
-
(2)
Image normalization. This is mainly to address the differences in the density and intensity of MRI images caused by different MRI devices and to reduce the influence of the external environment on the segmentation model. Therefore, we use linear normalization method to process the intensity of all images:
$$I_{norm} = \frac{{I_{ini} - I_{\min } }}{{I_{max } - I_{min } }}$$(2)
where \(I_{ini}\) denotes the gray value of the input MRI image, while \(I_{norm}\) represents the gray value after the color normalization process. \(I_{min }\) and \(I_{max }\) are the minimum and maximum gray values of the image, respectively. Following the function conversion, the range of values is shifted to between 0 and 1. Equation (2) effectively reduces the luminance differences between images, thereby enhancing the model’s training performance and accelerating convergence speed during program execution.
-
(3)
Effective diagnostic regions in images were identified. To efficiently locate the target region, we initiate a scan from the MRI image boundaries, traversing left to right, right to left, top to bottom, and bottom to top to delineate the human tissue boundary. The linear equations defining the boundaries are as follows: the upper boundary is defined by \(X = X_{max }\), the lower boundary by \(X = X_{min}\), the left boundary by \(Y = Y_{min}\), and the right boundary by \(Y = Y_{max }\). These equations outline the valid regions, as illustrated in Fig. 2. After removing the invalid regions, the size of the MRI image is significantly reduced, which is crucial for minimizing memory consumption in the subsequent segmentation model and allows the model to focus more on feature extraction from the valid regions.
Preprocessing effectively improves the quality of images and reduces the resource usage of invalid areas during model training. We classify and expand the dataset to enhance the model’s generalization performance, providing physicians with a more suitable and efficient foundation for auxiliary diagnosis.
Data classification and expansion
MRI images of patients with bone tumors contain images of three cross-sections of the human body, the coronal plane, and the sagittal plane. For the images of different human sections, the attention parameters of the segmentation model to learn the classification query in the pre-training stage may be different, and the model needs to learn the image features of different sections. Classifying the data set helps to reduce the learning time of the segmentation model for different section features and the frequent changes of model parameters, thereby improving its learning efficiency.
MobileNetV29 is a lightweight convolutional neural network that achieves higher accuracy and faster operational speeds with fewer computations through its innovative Inverted Residual Block structure. Therefore, we have selected MobileNetV2 as the base model for classifying the MRI dataset. Furthermore, as illustrated in Fig. 3, during the classification training process, we implemented several simple enhancements tailored to the characteristics of our bone tumor dataset.
Following classification training, all MRI images are categorized into three sub-datasets: the MRI Transverse Plane dataset (TPDataset), the MRI Sagittal Plane dataset (SPDataset), and the MRI Coronal Plane dataset (FPDataset). Each MRI image within these datasets contains both T1 and T2 sequences. During the training of the bone tumor segmentation model, the MRI images from the three datasets are sequentially input to extract the corresponding attention parameters. During the testing of the bone tumor segmentation model, all images invoke the corresponding segmentation model based on their respective datasets. This approach allows the RCROS model to significantly enhance segmentation accuracy while maintaining efficiency. Additionally, each patient’s images are automatically categorized, with each image corresponding to both T1 and T2 sequences. This capability is crucial for providing a reliable reference for the doctor’s clinical diagnosis in subsequent stages. Doctors can simultaneously review both image sequences and assess the patient’s condition from different perspectives, thereby effectively reducing the time spent retrieving images during film review.
Additionally, to enhance the performance of model training, we augmented the dataset. Initially, we randomly rotated the images in the intervals of [0,90֯ ),[90֯ ,180֯ ),[180֯ ,270֯ ), and [270֯, 360֯ ), effectively increasing the dataset size to four times its original. Subsequently, we applied a flip to the rotated dataset, further expanding it to eight times its original size. Finally, we conducted random color rendering on all MRI images, resulting in a final dataset that is 16 times larger than the original, with each MRI image effectively having 15 copies. This process is illustrated in Fig. 1.
Pixel enhanced segmentation method IMEocr design
In the previous section, we pre-processed the MRI images. In this section, we focus on the detailed design of the MRI segmentation method for bone tumors. This segmentation approach consists of two parts: the model utilizes an expanded ResNet-101 (with an output stride of 8) as the backbone network. ResNet-101 delivers two distinct outputs to the Object Context Enhancement Module (IMEocr). The first representation from stage 3 is employed to predict coarse segmentation (object regions), while the second representation from stage 4 undergoes a 3 × 3 convolution (with 512 output channels). The pixel enhancement segmentation method in IMEocr leverages the organized region representation of the corresponding category to enhance each pixel’s representation. After completing the segmentation, we further optimize the results using the SegFix algorithm. In the output phase of the model, all copies of the same image are uniformly processed using a decision composite to integrate the segmentation results.
In the recognition of MRI of bone tumors, texture context features based on the relative positions of pixel textures can predict pixel classes more accurately. However, in a practical setting, MRI image processing of bone tumors still faces the following problems:
-
(1)
Bone tumors are multi-scale problems. The sizes of tumors in different patients, different tumors in the same patient, and the same tumor may vary due to different distances from the MRI machine.
-
(2)
MRI images have complex textural features, and it is difficult to know whether a pixel belongs to a focal area or normal tissue when looking at it alone. Especially for tumor edges, the signal intensity is relatively weak and the brightness of the tissue is low.
-
(3)
Inhomogeneity in MRI images arises from both the instrumentation and the bone tumor lesions, resulting in varying degrees of noise and blurred, indistinct tumor borders.
These factors introduce significant challenges in the segmentation and recognition of bone tumor images. The probability of a pixel in a bone tumor MRI image belonging to a specific group depends on its inherent characteristics and the distribution of information among the surrounding pixels. Examining a pixel in isolation makes it challenging to determine whether it belongs to the lesion area or normal tissue. Therefore, when modeling each target feature in bone tumor MRI images, it is essential to consider the mutual constraints between them. Currently, numerous studies have proposed various methods for image segmentation; however, none of these models have achieved satisfactory results in MRI image processing for bone tumors. Additionally, the class label \(l_{i}\) of a pixel \(p_{i}\) corresponds to the label of the object to which the pixel belongs. Thus, we propose a bone tumor MRI image segmentation model, IMEocr, which integrates object regions and contextual representations to enhance pixel representation. IMEocr is based on the OCR model and characterizes each pixel by leveraging its corresponding object representation. The specific scheme is illustrated in Fig. 4. The model primarily includes the following points:
-
(1)
To address the limitation of traditional multi-scale contextual models, which do not differentiate between contextual pixel relationships for the same tissues and those for different tissues, IMEocr performs a weighted aggregation of all tissue region representations in an MRI image. It then enhances each pixel’s representation using this contextual information.
-
(2)
To address the problem that existing relational context methods do not consider both the pixel-context relationship and the pixel-region relationship, Imeocr considers both factors and constructs the context pixels as object regions.
-
(3)
To address the limitation of a single attentional mechanism and existing attentional modules with dimension-level integration, which cannot effectively handle bone tumor lesions with varying scales, irregular shapes, and diverse color distributions, a dual attentional mechanism is introduced in IMEocr. This mechanism obtains regions through supervised learning based on the segmentation truth values provided by radiologists. Furthermore, the relationship between a pixel and a region is calculated by considering both the pixel representation and the region representation.
IMEocr takes the segmentation results and computes a relational weight for each pixel relative to the other pixels in the MRI image. This weight is then overlaid with the original feature map to enhance the accuracy of the segmentation results. We employ a 1 × 1 convolutional linear function to predict the coarse segmentation (soft object region) from the first representation of backbone stage 3 and the final segmentation from the last representation. Additionally, we apply the pixel-by-pixel Tversky Loss during the final segmentation prediction. The implementation steps are outlined as follows.
Step 1. Target region extraction in MRI images
We divide the MRI image I of bone tumor, obtained after the initial segmentation by the backbone network, into K soft tissue regions, \(A_{1} ,A_{2} , \ldots ,A_{K}\), which include muscle tissue, fat, joint fluid, and yellow bone marrow from both the tumor region and normal tissue. Each object region \(A_{K}\) corresponds to category k and is represented by a 2D map, referred to as a coarse segmentation map. Each entry indicates the degree to which the corresponding pixel belongs to category k. This means that each entry corresponds to either normal human tissue or a tumor region.
The output representation of the backbone network stage 3 is used to compute K object regions. During training, the Tversky Loss method generates object regions through supervised learning based on ground truth segmentation. The ground truth consists of tumor boundaries manually segmented by radiologists.
Step 2. Calculate the region representation for each pixel point
Tumor regions exhibit high signal intensity in MRI images; however, there are variations in brightness among different tissues. Weighted aggregation of pixel representations within a region can enhance the accuracy of pixel predictions. We perform weighted aggregation of the representations of all pixels in the MRI image. The weight reflects the degree to which each pixel belongs to the k-th tissue region, resulting in the \(f_{k}\) representation of the k-th tissue region. Specifically, softmax is applied to the soft tissue region calculated in step 1 to obtain the category information of each pixel, which is then multiplied by the original pixel representation. This step aims to estimate the representation of each target region by aggregating the representations of all pixels corresponding to this region in the MRI image.
where \(x_{i}\) denotes a pixel in the MRI image, \(p_{i}\) and \(\tilde{\omega }_{ki}\) represent the normalization factors for pixel \(p_{i}\) of the k-th object region, \({\mathcal{I}}\) refers to a series of pixels in the MRI image, and \(f_{k}\) is the representation of the k-th object region. We apply spatial softmax to normalize each object region \(A_{k}\).
The decoder cross attention realizes the soft target region extraction and target region representation calculation in MRI. \(x_{i}\) is the image feature of key and value in the attention mechanism. K categories are queries \(q_{1} ,q_{2} , \ldots ,q_{K}\), each corresponding to a query. K categories of queries are mainly used to generate soft tissue regions, i.e., to obtain a rough segmentation of the tumors in MRI image feature map. The weights \(\tilde{\omega }\) are the result of softmax-normalization of \(A_{1} ,A_{2} , \ldots ,A_{K}\) in space. Computation of \(\tilde{\omega }\) is done in the same way as the attention24 weight \(a_{ij}\). The object region representation in Eq. 6 is computed in the same way as \(Atten\) is computed in Equation.
Step 3
Compute contextual representations of pixels in MRI. After aggregating the object-region representation and the pixel-representation, the relationship between the pixel and each organization region in the MRI image is obtained. And the pixel-region relation and the object region representation are used to obtain the contextual representation of the corresponding pixel.
The computation of the object context representation is shown in Fig. 4. The pixel features (cyan blocks) at each location in the MRI image are combined with the class region features (red bars) to establish the relationship between each pixel and the corresponding tissue region (yellow block R). Specifically, the relationship between each pixel and each target region in the bone tumor MRI images was calculated as follows.
where \({\mathcal{K}}\left( {x.f} \right) = \sigma \left( x \right)^{T} \varphi \left( f \right)\) is a function of the unnormalized computational relations, and \(\sigma \left( \cdot \right)\)and \(\varphi \left( \cdot \right)\) are two transformation functions implemented as 1 × 1 convolution → Batch Normalization (BN) → ReLU. Inspired by self-attention in Transformers24, where Q = K = V, this approach better estimates inter-pixel dependencies and is sufficiently simple to be computed in parallel. This approach can significantly enhance the efficiency of bone tumor MRI image segmentation.
Based on the above analysis, the object context representation of any pixel p in an MRI image is calculated according to the following equation:
where \(R_{ik}\) denotes the relationship between the i-th pixel and the k-th object region. Both \(\delta \left( \cdot \right)\) and \(\rho \left( \cdot \right)\) are transformation functions implemented as 1 × 1 convolution → Batch Normalization (BN) → ReLU. To better address the limitation of local operations in perceiving global context, we adopt the non-local block. The non-local block can accommodate inputs of arbitrary size while ensuring that the output matches the dimensions of the original image. Furthermore, it can be incorporated as a component into the network proposed in this study.
The encoder cross-attention mechanism and feed-forward network (FFN) are used to aggregate object region representations. In Eq. (6), the keys and values in the attention mechanism correspond to the outputs of the decoder, while the query represents the image features of each point. The context aggregation formula (10) is computed in the same manner as Equation \(Atten\) (Eq. 8), with \(\rho \left( \cdot \right)\) representing the FFN operator in the Transformer model24.
Step 4
The augmented representation of the pixels is computed and the final augmented representation feature map is fed back into the backbone network for training.
The concatenated feature map is converted into an enhanced feature map after operations such as convolution, and then used for new segmentation. Two components are integrated to update the final representation of pixel \(p_{i}\): (1) the original representation \(x_{i}\), and (2) the object context representation \(y_{i}\).
where \(g\left( \cdot \right)\) is the transformation function employed to fuse the original representation with the object context representation, implemented as 1 × 1 convolution → Batch Normalization (BN) → ReLU. The enhanced representation process is illustrated in the orange box of Fig. 5. In this process, the pixel and region relationships are combined with the category region features to produce a new feature map that incorporates both pixel and region relations, which is then concatenated with the original feature map.
The output of the segmentation model is generated by reintroducing the enhanced MRI feature maps into the backbone network. Due to the ambiguous boundaries of bone tumor, the segmentation model may produce errors when predicting these boundaries. In contrast, the model is likely to predict the tumor’s interior more accurately due to the pronounced disease symptoms and distinct texture features. To enhance the boundary quality of the segmentation results and provide the surgeon with a more accurate surgical scope for tumor resection, we optimize the model’s segmentation results using SegFix. This algorithm substitutes the initially unreliable predictions of boundary pixels with the predictions of interior pixels, aligning with the characteristics of MRI images of bone tumor. This algorithm does not impose any a priori conditions and can achieve real-time performance. The detailed structure of SegFix is shown in Fig. 5 below.
SegFix processes the tumor segmentation modifications to yield the segmentation results following model optimization. In the output phase of the segmentation result optimization model, we employ composite decision-making to integrate outputs of the same MRI image rendered from various angles and in different color schemes. This approach is primarily based on the observation that, in bone tumor diagnosis, the same MRI image can be transformed through rotation, flipping, or enhancement. Therefore, the texture features of human tissue may show different strengths, resulting in different predictions of the segmentation model. Unifying the results from the same MRI image can improve the accuracy and robustness of the output. After inverting the output, the specific calculation of the overall output result is as follows:
where W represents the width of the image M, H represents the height of the image M, and σ represents the sigmoid function. The final annotated tumor region produced by our proposed model exhibits high similarity and overlap with the ground truth manually segmented by clinicians. Clinicians utilize the results of model segmentation as a reference for diagnosing bone tumor, thereby reducing their workload in interpreting MRI images and enhancing diagnostic efficiency. Most importantly, using model segmentation results as a reference for auxiliary diagnosis can significantly reduce the likelihood of missed and misdiagnoses, thereby ensuring the quality of diagnosis and treatment.
However, there is a data imbalance in MRI images of bone tumor patients, leading to model training characterized by high precision and low recall. In clinical practice, underdiagnosing the bone tumor region is a more critical error than misclassifying normal tissue as a lesion. Tversky Loss13 is a combination of Dice Loss and the Jaccard coefficient. This loss function effectively addresses the issue of imbalanced bone tumor data and establishes a better balance between precision and recall. In this paper, Tversky Loss is employed as the loss function, and it is calculated as follows:
\(p_{i}^{os}\) represents the probability that the i-th voxel is a bone tumor, \(g_{i}^{os}\) is 1 for the lesion voxel, and \(\mu\) is a small constant added to prevent division by zero, taking a value of 1e-8. A and β control the extent to which the loss function penalizes false positives (FP) and false negatives (FN), respectively. Specifically, the trade-off between false positives (FP) and false negatives (FN) can be managed by adjusting the parameter \(\alpha ,\beta\). Notably, when α = β = 0.5, the Tversky exponent is equivalent to the Dice coefficient, which also corresponds to the \(F_{1}\) score. As β increases, the loss function penalizes false negatives (FN) more, resulting in higher Recall for the model but lower specificity. By adjusting the coefficients, we can achieve effective segmentation of both very small and large tumor regions. Experimental results indicate that when α = 0.25 and β = 0.75, our proposed method demonstrates the best segmentation similarity and overlap. Consequently, α = 0.25 and β = 0.75 are employed in this paper.
Experiments and analysis
Dataset
[Dataset 1]: All data in this study were obtained from the Second People’s Hospital of Huaihua City4. During the experiment, we collected over 80,000 MRI images of bone tumors and related index data from 204 patients with varying degrees of bone tumors, as illustrated in Fig. 6. To address the varying image sizes, we employed a bicubic interpolation algorithm to normalize all images to a consistent resolution of 512 × 512. We partitioned the dataset into a training set and a test set in a ratio of 7:3. Consequently, the training set comprised 69.9% of the data, while the test set accounted for 30.1%.
A total of 80,350 MRI images were obtained from 204 patients with bone tumors. Of these, 56,164 images are included in the training set, 8,196 in the validation set, and 15,990 in the test set. To enhance the generalization performance and robustness of the model, we randomly rotated, flipped, and augmented the original dataset to create an expanded dataset. The final dataset is categorized into three sub-datasets: the cross-sectional dataset (14.2%), sagittal dataset (66.9%), and coronal dataset (18.9%). Each image consists of two sequences: T1 and T2 images. Training was conducted sequentially on the three datasets, and the final results were averaged across these categories.
[Dataset 2]: MRI hippocampus segmentation dataset25. To verify the generality of the RCROS model, we added an MRI hippocampus segmentation dataset for experiments. This dataset contains 135 hippocampus segmentation images, of which 100 are used as training data and 35 are used as test data.
In addition, we adopted a fivefold cross-validation method to ensure the accuracy and reliability of the model evaluation. The model was trained for a duration of seven to eight hours, encompassing a total of 450 epochs. We used the Adam optimizer during training, starting with a learning rate of 0.001. After 300 epochs, this learning rate was lowered to 0.0001, and CosineAnnealingLR was then used to dynamically modify it throughout the training phase.
Evaluation index
In the experiments conducted in this study, we selected the following metrics to qualitatively and quantitatively evaluate the results at each epoch. These metrics include accuracy (\(mAcc_{os}\)), precision (\(mPre_{os}\)), recall (\(mRe_{os}\)), F1-Score (\(mF1_{os}\)), Intersection over Union (\(mIOU_{os}\)), Dice Similarity Coefficient (\(mDSC_{os}\)), and Floating Point Operations (FLOPs). A detailed analysis of these metrics is presented in Table 2.
Where \(S_{pre}\) represent the tumor region segmented by the model, and \(S_{truth}\) denote the real tumor region manually segmented by the doctor. Additionally, we denote the number of model parameters as Params. A larger value of Params indicates that the model requires more storage space. Floating Point Operations (FLOPs) are employed to assess the computational complexity of the model. In the clinical diagnosis of bone tumor, both omissions and misdiagnoses are critical errors. Therefore, when segmenting MRI images of bone tumor, we aim to enhance the recall rate to minimize the risk of missed diagnoses.
Results and discussion
We first conducted ablation studies on the model, The results of each model on the osteosarcoma MRI dataset are shown in Fig. 7. The leftmost column (a) shows the tumor region manually labeled by the radiologist. Column (b) presents the results from unprocessed data, which was directly input into the bone tumor segmentation model without boundary optimization. Column (c) displays the results from preprocessed data, though the segmentation results have not undergone edge optimization. The rightmost column (d) illustrates the segmentation results obtained from the RCROS system proposed in this study, where the data have been preprocessed and the segmentation results are edge-optimized. As illustrated in the figure, when the data are not preprocessed, irregularities and blurriness in the tumor edges of MRI images may lead to over-segmentation or under-segmentation. Additionally, the blurriness of the bone tumor edges in MRI images may result in incorrect predictions of edge pixels or subtle gaps. Overall, the results indicate that the segmentation outcomes produced by the RCROS method closely align with the regions labeled by the physician.
To quantitatively analyze the performance of the RCROS model before and after optimization of our osteosarcoma MRI dataset, we provide a detailed comparison of the indicators in Table 3 and Fig. 8. The accuracy of the dataset without preprocessing and edge optimization is 0.9756. The trained RCROS model achieves an accuracy of 0.9943 and a recall of 0.9426, with a computational complexity of only 125.67G FLOPs. Overall, our preprocessing and edge optimization improve the model’s segmentation performance while maintaining low computational complexity.
To further evaluate the performance of our method in bone tumor MRI image segmentation, we compared it with several classical algorithms. The comparison includes advanced segmentation models such as PSPNet11, U-Net14, FPN26, and FCN-8 s27, as well as studies focused on bone tumor CT images, specifically MSFCN28 and MSRN29. Additionally, we included recognition methods for bone tumor MRI images, including DCHS30 and MTCF31. To ensure fairness and accuracy in our comparisons, we conducted experiments using the dataset from this study under the same conditions.
Figure 9 shows the segmentation performance of each model on bone tumor data. The figure clearly indicates substantial differences in accuracy among the various methods. Among these methods, the segmentation performance of the DCHS method is the least effective. Particularly for irregularly shaped tumors, the DCHS segmentation model often inaccurately identifies lesions, misclassifying them as normal tissues. The MTCF method demonstrates a notable improvement in segmentation; however, it still fails to achieve precise delineation of tumor boundaries. Several other models exhibit over-segmentation, primarily due to inaccuracies arising from blurred tumor boundaries. Our method not only accurately locates tumor regions but also effectively addresses the irregularity and ambiguity of bone tumor boundaries. Even for tumors with irregular shapes, sizes, and ambiguous edges, our RCROS method provides superior segmentation and localization. Overall, our method achieves superior segmentation performance. This demonstrates that our proposed method serves as a more suitable reference for clinical diagnosis by physicians.
Comparison of segmentation performance of different method. Column (a) of the figure illustrates that bone tumor appears in various scales and shapes in MRI images. Column (b) displays the tumor regions manually labeled by the physician. Columns (c) to (k) illustrate the segmentation outcomes of different models.
In addition to the visual comparison, we present the average results of our individual model tests to more accurately compare the performance of each model, as shown in Table 4. Our method outperforms most other models across several metrics. Without image preprocessing and result optimization, our method still demonstrates superior performance metrics compared to other segmentation models. When image preprocessing and result optimization are applied, our method achieves higher values across most metrics compared to other models. Notably, our model shows significant improvements in the IOU, DSC, and Recall metrics compared to the optimal classical segmentation model. The segmentation results closely align with the tumor regions manually delineated by physicians, indicating that our method outperforms classical models in terms of generalization and robustness. The Recall value of our method approaches 0.95, making it unlikely to overlook misdiagnoses in actual clinical practice. This high performance is highly beneficial for assisting physicians in clinical diagnosis. In particular, for several auxiliary segmentation methods for bone tumor, the DCHS method has the lowest indexes, and the DSC value and IOU value are only 0.7, which indicates that its segmentation effect is poor, which coincides with the segmentation effect graph obtained in Fig. 9. Moreover, according to the variance results, it can be seen that the stability of each model is different. Our method has better stability. This paper uses the Wilcoxon signed rank test to determine whether there is a significant difference in the performance indicators of RCROS and other models. Taking the DSC of our model and Unet as an example, ρ < 0.05 is calculated, indicating that there is a significant difference in performance between our model and UNet. Compared with DCHS, MTCF has significant improvement in all indexes, but its precision is also only 0.88. MSFCN and MSRN methods have better segmentation effect, but they have higher FLOPs and need more computational cost.
To intuitively evaluate the performance of each segmentation network on the bone tumor dataset, Fig. 10 illustrates the relationship between the DSC and IOU values of various methods and the number of parameters. The FCN-8 s model requires the highest number of parameters, indicating that it has greater hardware requirements for segmenting MRI images of bone tumors. Although the FPN model demonstrates higher similarity (DSC) and overlap (IOU) with tumor regions manually segmented by physicians, it requires a substantial number of parameters. This high demand on memory and CPU makes it unsuitable for segmenting large-capacity MRI images of bone tumor patients. The U-Net model performs better in segmenting MRI images. Although the U-Net model has more parameters than the MSRN model, its segmentation performance is significantly better. The MSFCN and MSRN methods, applied for detecting bone tumors in CT images, yield better results, whereas the DCHS indicators are the lowest. Our proposed bone tumor segmentation model not only achieves the highest DSC and IOU values but also requires the fewest parameters. The DSC reaches 0.9, the IOU reaches 0.88, and the number of parameters is only 7.95 million. This indicates that the predicted tumor region of our RCROS model closely aligns with the manual segmentation results by physicians and that its memory and CPU requirements are minimal. This greatly reduces hardware requirements in real-world environments, leading to lower training costs.
Figure 11 illustrates the relationship between FLOPs and DSC and IOU values of different models in the bone tumor dataset. The DCHS method exhibits the lowest individual metrics compared to the other models, with values around 0.7. The MTCF model shows significant improvement while maintaining lower FLOPs. Although the MSFCN and MSRN models achieve higher DSC and IOU values, they are computationally expensive to train. In comparison, both the FCN-8 s and PSPNet models exhibit low computational complexity. However, the similarity between the tumor region segmented by the FCN-8 s model and the ground truth is low, as indicated by the small DSC value. Additionally, the PSPNet segmentation exhibits a low overlap with the region delineated by physicians, as reflected in the small IOU value. Both situations are unfavorable in actual clinical practice. Our segmentation system significantly improves the DSC and IOU values without incurring excessive computational cost. It achieves an effective balance between speed and accuracy to obtain the best segmentation results.
In the clinical diagnosis of bone tumors, misdiagnosis of tumor areas is more serious than misdiagnosing normal tissue as tumors. The recall rates, parameters and FLOPs of different networks are shown in Fig. 12. The figure indicates that the DCHS method has the lowest recall, despite having fewer than 30 parameters and fewer than 400 FLOPs. The MSRN and MSFCN models exhibit higher recall but also entail significant computational complexity. The U-Net and FPN models demonstrate better performance. Overall, our model achieves the highest recall at 0.94 while maintaining lower device requirements, thus achieving a balance between speed and precision.
To evaluate the universality of the model, the segmentation results of each model on the hippocampus MRI dataset are shown in Table 5. Among them, the worst performance is the method of K. Somasundaram et al. The IOU of this method is only 0.7213. The second is the FPN model, with an IOU of 0.8316. Our model is close to the results of Hazarika’s research. Both of them have good performance in the recognition of hippocampus MRI.
The time required to train each model on the 300 osteosarcoma MRI dataset is shown in Table 6. As can be seen from the table, the MSRN and MSFCN models require the longest training time. This is mainly due to the high computational density of MSFCN and the increased communication overhead of the residual connection of MSRN. The DCHS and MECF models are the next, with a training time of about 5 min per epoch. Relatively speaking, our model has the shortest training time.
Conclusion
This paper proposes a bone tumor assisted segmentation system (RCROS) based on regional and contextual representation, which includes data preprocessing, segmentation model and prediction result boundary optimization. The segmentation model uses context and regional representation in bone tumor MRI images to enhance the representation of each pixel. Experiments show that our method strikes a balance between segmentation accuracy, training time and computational cost. Compared with existing bone tumor segmentation models, our method is more suitable for universal application in actual medical environments.
However, the performance of the RCROS method may be highly dependent on the quality and diversity of the dataset. If the training dataset is not comprehensive or representative enough, the generalization ability of the model in practical applications may be limited. In the future, with the continuous expansion of patient datasets and the development of artificial intelligence, we will combine cross-domain data and improve the generalization of the model through migration technology. At the same time, the use of massive unlabeled data will also be a research focus.
Data availability
Data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 12 months after publication of this article, will be considered by the corresponding author.
References
Anisuzzaman, D. M., Barzekar, H., Tong, L., Luo, J. & Yu, Z. A deep learning study on osteosarcoma detection from histological images. Biomed. Signal Process. Control 69(5), 102931 (2021).
Wu, J., Chang, L. & Yu, G. Effective data decision-making and transmission system based on mobile health for chronic disease management in the elderly. IEEE Syst. J. 15(4), 5537–5548. https://doi.org/10.1109/JSYST.2020.3024816 (2021).
Zhan, X. et al. A convolutional neural network-based intelligent medical system with sensors for assistive diagnosis and decision-making in non-small cell lung cancer. Sensors. No. 23, 7996. https://doi.org/10.3390/s21237996 (2021).
Jia, Wu. et al. Intelligent segmentation medical assistance system for mri images of osteosarcoma in developing countries. Comput. Math. Method. Med. 2022, 17. https://doi.org/10.1155/2022/7703583 (2022) (Article ID 6654946).
Yang, X., Zhang, Y., Lo, B., Wu, D. & Zhang, Y. Dban: Adversarial network with multi-scale features for cardiac mri segmentation. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2020.3028463 (2020).
Lou, A., Guan, S. & Loew, M. DC-UNet: Rethinking the U-Net architecture with dual channel efficient CNN for medical image segmentation. In Medical Imaging 2021: Image Processing. 11596 758-768 (SPIE, 2021).
Chang, L., Jia, Wu., Moustafa, N., Bashir, A. K. & Keping, Yu. AI-driven synthetic biology for non-small cell lung cancer drug effectiveness-cost analysis in intelligent assisted medical systems. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2021.3133455 (2021).
Cui, K. Dynamic MRI reconstruction via weighted tensor nuclear norm regularizer. IEEE J. Biomed. Health Inform. 25(8), 3052–3060. https://doi.org/10.1109/JBHI.2021.3061793 (2021).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. C. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4510-4520 https://doi.org/10.1109/CVPR.2018.00474 (2018).
Sampedro-Puente, D. A. et al. Characterization of spatio-temporal cardiac action potential variability at baseline and under $\beta$ -adrenergic stimulation by combined unscented kalman filter and double greedy dimension reduction. IEEE J. Biomed. Health Inform. 25(1), 276–288. https://doi.org/10.1109/JBHI.2020.2984647 (2021).
Luo, Z., Jia, Z., Yuan, Z. & Peng, J. HDC-Net: Hierarchical decoupled convolution network for brain tumor segmentation. IEEE J. Biomed. Health Inform. 25(3), 737–745. https://doi.org/10.1109/JBHI.2020.2998146 (2021).
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6230-6239 https://doi.org/10.1109/CVPR.2017.660. (2017).
Yang, M., Yu, K., Zhang, C., Li, Z. & Yang, K. DenseASPP for semantic segmentation in street scenes. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 3684-3692 https://doi.org/10.1109/CVPR.2018.00388 (2018).
Ronneberger, O., Fischer, P., & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science 9351. (eds Navab, N., Hornegger, J., Wells, W., & Frangi, A.) https://doi.org/10.1007/978-3-319-24574-4_28 (Springer, Cham, 2015).
Zhou, Z. UNet++: A Nested U-Net architecture for medical image segmentation In Deep Learning in Medical Image Analysis (DLMIA) 3–11 https://doi.org/10.1007/978-3-030-00889-5_1 (Granada, Spain, 2018).
Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation. arxiv preprint arxiv:2102.04306(2021).
Cao H, Wang Y, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision 205–218 (Cham: Springer Nature, Switzerland, 2022).
Gao, Y., Zhou, M. & Metaxas, DN. UTNet: A hybrid transformer architecture for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24 61-71 (Springer International Publishing 2021).
Ji, Y., Zhang, R., Wang, H., Li, Z., Wu, L., Zhang, S. & Luo, P. Multi-compound transformer for accurate biomedical image egmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Con-ference,Strasbourg, France, September 27–October 1 2021, Proceedings, Part I 24; Springer International Publishing: New York 326–336 (NY, USA, 2021).
Anisuzzaman, D., Barzekar, H., Tong, L., Luo, J. & Yu, Z. A deep learning study on osteosarcoma detection from histological images. Biomed. Signal Process. Control 69, 102931 (2021).
Barzekar, H. & Yu, Z. C-Net: A reliable convolutional neural network for biomedical image classification. Expert. Syst. Appl. 187, 116003 (2022).
Nabid, R., Rahman, M., Hossain, M.F. Classification of osteosarcoma tumor from histological image using sequential RCNN. In Proc. of the 2020 11th International Conference on Electrical and Computer Engineering (ICECE) 17–19 (Dhaka, Bangladesh, 2020).
Parlak, S. et al. Diffusion-weighted imaging for the differentiation of Ewing sarcoma from osteosarcoma. Skelet. Radiol. 50, 2023–2030 (2021).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. & Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Sys-tems 30: Annual Conference on Neural Information Processing Systems 2017 (eds. Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N. & Garnett, R.) 5998–6008 https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (Long Beach, CA, USA, 2017)
T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, Feature pyramid networks for object detection In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 936-944 https://doi.org/10.1109/CVPR.2017.106 (2017).
Shelhamer, E., Long, J. & Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Transact. Pattern Anal. Mach. Intell. 39(4), 640–651. https://doi.org/10.1109/TPAMI.2016.2572683 (2017).
Huang, L., Xia, W., Zhang, B., Qiu, B. & Gao, X. MSFCN-multiple supervised fully convolutional networks for the osteosarcoma segmentation of CT images. Comput. Method. Program. Biomed. 143, 67–74. https://doi.org/10.1016/j.cmpb.2017.02.013 (2017) (Epub 2017 Feb 20 PMID: 28391820).
Rui, Z., Lin, H., Wei, X., Bo, Z. & Qiub, B. Multiple supervised residual network for osteosarcoma segmentation in ct images. Comput. Med. Imag. Gr. 63, 1–8 (2018).
Mandava, R., Alia, O M., Wei, BC., Ramachandram, D., Aziz, M E. & Shuaib, IL. Osteosarcoma segmentation in MRI using dynamic Harmony Search based clustering In 2010 International Conference of Soft Computing and Pattern Recognition 423-429 https://doi.org/10.1109/SOCPAR.2010.5686624 (2010).
Huang, W., Wen, D., Yan, Y., Yuan, M. & Wang, K. Multi-target osteosarcoma MRI recognition with texture context features based on CRF. In 2016 International Joint Conference on Neural Networks (IJCNN) 3978–3983 https://doi.org/10.1109/IJCNN.2016.7727716 (2016).
Somasundaram, K. & Genishy, T. An atlas based approach to segment the hippocampus from MRI of human head scans for the diagnosis of Alzheimers disease. Int J. Comput. Intell. Inform. https://doi.org/10.1016/j.zemedi.2018.11.002 (2015).
Hazarika, R. A. et al. Hippocampus segmentation Using U-Net convolutional network from brain magnetic resonance imaging (MRI). J. Digit. Imaging 35, 893–909. https://doi.org/10.1007/s10278-022-00613-y (2022).
Author information
Authors and Affiliations
Contributions
Writing—original draft, Y. L.; writing—review & editing, J. L., T. D., F. G. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, Y., Liu, J., Dai, T. et al. Bone tumor recognition strategy based on object region and context representation in medical decision-making system. Sci Rep 15, 9869 (2025). https://doi.org/10.1038/s41598-025-94213-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-94213-9