Introduction

Cervical cancer ranks fourth globally in terms of the incidence and mortality among women1. According to the WHO, approximately 660,000 women are diagnosed with cervical cancer, and approximately 350,000 women die from the disease2. Persistent HPV infection is the primary cause of cervical cancer is persistent infection with human papillomavirus (HPV)3. Timely detection and stage classification is crucial3,4. However, one of the major challenges in early detection is that patients with cervical cancer typically show little to no symptoms during the initial stages. This makes quick and accurate stage classification essential, as it allows patients to receive appropriate treatment and reduces associated risks. To prevent and eliminate cervical cancer, the World Health Organization (WHO) recommends HPV vaccination for girls aged 9–14 and regular cervical screening for women5,6. After cervical screening, such as Pap and HPV tests, colposcopy is performed on women with abnormal cytology or positive HPV test results to obtain more appropriate care. Colposcopy is the most commonly used technique for diagnosing cervical cancer7. It is performed by specific experts, such as a colposcopist, using acetic acid and a colposcope. After applying 3% or 5% acetic acid to the TZ epithelium, regions that appear white are more likely to be precancerous8,9,10. The more severe the lesion, the denser the whiteness, which is sometimes called oyster white10. Many healthcare professionals, from advanced practice providers to gynecological oncologists, are involved in performing colposcopies. However, the procedure often suffers from a lack of standardized protocols, and there is a significant gap in both the necessary training and ongoing skill development required to maintain proficiency11. The ability to grade cervical lesions during a colposcopic examination depends largely on the colposcopist’s experience11,12 which means the subjective thoughts of an individual colposcopist can influence the results. Furthermore, over 80% of cervical cancer cases occur in low-income countries, where access to skilled colposcopists is limited13. The integration of Artificial Intelligence (AI) presents a promising solution to mitigate the impact of cervical cancer in low-income countries and obtain more standardized procedures. Because artificial intelligence is expected to outperform humans in finding more repeated features, many researchers have attempted to integrate artificial intelligence into medical fields such as cancer imaging and classification14,15. Machine learning, an artificial intelligence method, is used in healthcare to analyze and interpret images to help diagnose diseases and plan treatments. This technology increases the accuracy and speed of diagnoses, supports personalized treatment plans, and predicts patient outcomes. Various studies on the use of cervicographic images and AI in cervical cancer diagnosis have shown remarkable outcomes16.

Ouh et al. developed an AI-based analysis software, CerviCARE AI, that achieved 98% accuracy for high-risk groups. In total, 11,500 images with negative and atypical data and 11,225 images with positive data were used. CerviCARE AI showed a sensitivity of 98% for high-risk groups and 95.5% sensitivity17. Nurmaini et al. developed a real-time mobile AI-assisted cervicography interpretation system. It was devised for use in portable devices such as smartphones in areas with less access to experienced experts. The model had excellent inference time and accuracy, achieving more than 98% precision, sensitivity, and AP18. Acosta-Mesa et al. used machine learning to classify colposcopic images into normal and abnormal cervical cancer stages. The k-nearest neighbor machine learning model (k-NN) was used to compare the temporal patterns of the acetic whiteness areas. Their model showed a sensitivity of 71% and specificity of 59%19.

Although acetowhite is a crucial element in visually staging cervical cancer, the potential of acetowhite areas to classify cancer grading remains underexplored. Although research has explored AI applications for cervical cancer detection, current models do not incorporate acetowhite regions as a feature17. Further studies on the correlation between acetic whiteness and cervical cancer stage are required. Additionally, there are differences in the distinction and irregularity between an acetowhite lesion and its surrounding areas depending on the stage of cervical cancer progression10,19. Although these features can be crucial for effective grading, there remains a lack of research on how the area surrounding an acetowhite lesion can be used for cervical cancer grading.

The goal of this model is to assist those with severe cervical cancer with timely and appropriate treatment. Therefore, this study aimed to improve the accuracy and precision of cervical cancer diagnosis and grading by effectively exploiting the acetate region and its surroundings. We hope to increase the reliability of cervical cancer diagnosis by exploiting important information that may have been missed by existing diagnostic models.

Materials and methods

The Institutional Review Board (IRB) of Gachon University Gil Medical Center approved this study (IRB No: GDIRB2020-346) and waived the requirement for informed consent due to the retrospective nature of the study. All experimental protocols adhered to the relevant guidelines and regulations outlined in the Declaration of Helsinki. The datasets consisted of Atypical, Positive 1, and Positive 2. A total of 464 cervicography images were used in this study, and each image was obtained after the application of acetic acid to the cervix. Examples of the colposcopic images used in this study are shown in Fig. 1, and the composition of the data in the training set is listed in Table 1. Three gynecology specialists labeled the cervicographic images that exhibited an acetowhite reaction. Data were used only when the labeling results of at least two of the three specialists agreed, and the images were taken by Dr. Cervicam. The images varied in size but mostly had a width of 1504 pixels and a height of 1000 pixels.

Fig. 1
figure 1

Examples of colposcopic images: An example of colposcopic images. All images were taken after acetic acid solution application; (a) Colposcopic image labeled Atypical; (b) Colposcopic image labeled Positive 1; (c) Colposcopic image labeled Positive 2.

Table 1 The composition of data used for training.

This study used a system consisting of four graphics processing units (NVIDIA TITAN Xp, Santa Clara, California, USA), an Intel Xeon E5-1620 v4 CPU, and 251GB of RAM. PyRadiomics 3.4.2. was used as the open-source library for this research. Radiomics FirstOrder, GLCM, GLRLM, and GLSZM were used for feature extraction.

Figure 2 illustrates the overall workflow, detailing each step of the process used in this study. Two mask images with varying degrees of focus on the acetowhite area were used in this study to focus on the role of acetic white areas and their surrounding areas in cervical cancer labeling. One mask image only includes acetowhite lesions; therefore, the areas adjacent to the lesion are black. The other was a mask image that included acetowhite lesions with 10 pixels along the margins of the lesion. The dilation method was used to expand the masks. After extracting the mask images, the overlay images were used to verify whether the mask images were well extracted. A dilation function was used to expand the mask images to 10 pixels surrounding the acetowhite area. Radiomics encompasses a variety of techniques used for extracting quantitative features from medical images, such as MRI, CT, ultrasound, and single-photon emission computed tomography20,21,22. It extracts and analyzes features from medical images that are difficult to discern with the naked eye and represents them as quantified values. Radiomics helps improve the diagnostic, prognostic, and predictive accuracy of image interpretation23. In this study, radiomics was used to extract the features of the acetowhite areas at certain cervical cancer stages. The acetowhite reaction becomes brighter as the cervical cancer becomes more severe9. Therefore, the brightness of the lesion is important for classifying cervical cancer based on the acetowhite reaction. Cervical images, including those of the acetowhite reaction, generally appear reddish to reddish white. To precisely examine the difference in brightness and whether the lesion shows a brighter color in reddish images, this study used the blue channel from the RGB channels. 4 methods of radiomics were used in this study. In the First order, 18 features were extracted using the distribution of voxel intensities within the image24, and 24 features were extracted using a gray-level co-occurrence matrix (GLCM), which characterizes texture based on the relationship between two neighboring pixels within an image. Using the gray-level run length matrix (GLRLM), which extracts the run length of pixels with identical colors, 16 features were extracted24. Using a gray-level zone matrix (GLSZM), which extracts the number of consecutive pixels with the same gray-level value, 16 features were extracted24. In summary, 75 features were extracted from the colposcopic images.

Fig. 2
figure 2

Flow chart of the research: (Exp 1) From the input images, the model gets the mask images that only represent areas of acetowhite lesion. The mask images are used to extract radiomics features. Going through feature extraction and feature selection, five features were selected for cervical cancer classification. Four different kinds of classifiers were trained using the selected features, then classified the test images and produced results. (Exp 2) works just as Exp 1, except that mask images used at Exp 2 were expanded 10 pixels to the margin of the lesion.

When training machine learning models, selecting the features to train is as important as extracting features from the data because selecting appropriate features helps to reduce noise and lowers the error rate. This study employed Recursive Feature Elimination (RFE) as the feature selection method. RFE iteratively removes the least important features from the entire set of extracted features until only the desired number of important features remains. Through this process, five key features of high importance were selected from 75 features. As a result, for the mask images capturing only the acetowhite region, small-area emphasis, Size Zone NonUniformity normalization, 90th percentile, Gray Level NonUniformity Normalized, and skewness were selected. For the mask images extended by 10 pixels, the selected features were small-area emphasis, 90th percentile, Size Zone NonUniformity Normalized, skewness, and autocorrelation. In this study, four classifiers were employed to accurately classify atypical and positive stages in cervicographic images: logistic regression (LR), which analyzes the relationship between multiple independent variables and a single dependent variable25, random forest (RF), which aggregates the results of multiple decision trees to derive a single outcome26, XGBoost (XGB), which improves the performance and speed of machine learning using gradient boosting27, and support vector machine (SVM), which classifies data in a high-dimensional feature space28. The training dataset comprised 329 atypical, 417 positive 1 data, and 277 positive 2 data. The images were checked for errors, and no errors were found. In this study, we compared the classification results of three gynecological specialists with those of machine learning models to evaluate the performance of cervical stage classification using either the acetowhite region alone or the acetowhite region and its surrounding areas in cervicographic images. By comparing the specialist classifications with those of the models, we categorized the results into true positive (TP), false negative (FN), true negative (TN), and false positive (FP) classifications. The accuracy, sensitivity, and precision used for the classification evaluation were calculated using Eqs. (1)–(3).

$$\:\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}\:=\:\frac{\text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\:\text{c}\text{l}\text{a}\text{s}\text{s}\text{i}\text{f}\text{i}\text{c}\text{a}\text{t}\text{i}\text{o}\text{n}\:}{\text{t}\text{o}\text{t}\text{a}\text{l}\:\text{c}\text{l}\text{a}\text{s}\text{s}\text{i}\text{f}\text{i}\text{c}\text{a}\text{t}\text{i}\text{o}\text{n}}\:X\:100=\frac{\text{T}\text{P}\:+\:\text{T}\text{N}}{\text{T}\text{P}\:+\:\text{T}\text{N}\:+\:\text{F}\text{P}\:+\:\text{F}\text{N}}\:X\:100$$
(1)
$$\:\text{S}\text{e}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}\:=\:\frac{TP}{TP+FN}\:X\:100$$
(2)
$$\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}\:=\:\frac{\text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{l}\text{y}\:\text{c}\text{l}\text{a}\text{s}\text{s}\text{i}\text{f}\text{i}\text{e}\text{d}\:\text{a}\text{c}\text{t}\text{u}\text{a}\text{l}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}}{\text{e}\text{v}\text{e}\text{r}\text{y}\text{t}\text{h}\text{i}\text{n}\text{g}\:\text{c}\text{l}\text{a}\text{s}\text{s}\text{i}\text{f}\text{i}\text{e}\text{d}\:\text{a}\text{s}\:\text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e}}=\:\frac{TP}{TP+FP}\:X\:100$$
(3)

To compare the performances of the models, we plotted the Receiver Operating Characteristic (ROC) curve, which represents the classification performance of each model, and calculated the Area Under the Curve (AUC). The AUC score ranges from 0 to 1, with values closer to 1 indicating a better model performance.

Result

In this study, we evaluated the classification performance of the models by comparing the results of eight different cases, including the training outcomes of two types of mask images and four machine learning models (LR, SVM, RF, and XGB), with the labeling results from three gynecological specialists. A comparison of the performance of machine learning models trained using mask images including only the acetowhite lesion region versus those including the acetowhite lesion region and 10 pixels along its margins is shown in Table 2. An aceto-white mask refers to a mask image that includes only aceto-white lesions. An expanded aceto-white mask refers to images that include aceto-white lesions with 10 pixels along their margins.

Table 2 Comparison table of classification model performance by trained images and machine learning models.

According to Table 2, the model trained on the mask images that include the acetowhite region with an additional 10 pixels from the boundary show the highest performance, with an average AUC of 0.90. For models trained on images containing only the acetowhite region, the LR model shows strong performance with an AUC of 0.79 (CI: 0.72–0.86), while the SVM model has the highest precision at 0.83 (CI: 0.75–0.92). Among the models trained on the mask images with an additional 10 pixels from the acetowhite region, both the SVM and XGB models show high performance, with an AUC of 0.91 (CI: 0.86–0.95). The Friedman test was used to identify whether there was a statistically significant difference in the performances of the classifiers. P-value of less than 0.05 indicates a statistically significant difference, and although the expanded acetowhite mask outperformed the acetowhite mask, the acetowhite mask is statistically significant with a p-value of 0.020.

Figure 3 shows the five most important features selected using RFE and their importance values for the mask images containing only the acetowhite region. As shown in Fig. 3, the most important features of the mask images containing only the acetowhite region are small area emphasis, Size Zone NonUniformity Normalized, 90th Percentile, Gray Level NonUniformity Normalized, and skewness.

Fig. 3
figure 3

Importance of 5 features selected via RFE for acetowhite areas.

Figure 4 illustrates the top five most important features, along with their importance values, identified using the Recursive Feature Elimination (RFE) technique for mask images that included the acetowhite region and an additional 10 pixels around its margin. As depicted in Fig. 4, the features of small-area emphasis, 90th percentile, Size Zone NonUniformity Normalized, skewness, and autocorrelation are selected as the most significant features of these mask images.

Fig. 4
figure 4

Importance of 5 features selected via RFE for areas with an additional 10 pixels from the acetowhite region.

Discussion

The aim of this study was to develop a more standardized cervical cancer diagnostic model using acetowhite region data and to determine the best way to utilize the acetowhite region in developing a cervical grading model. In this study, 464 cervical images were used to extract features that were highly important for cervical image classification. Because radiomics accepts only single-channel inputs and the blue channel allows capturing more data from predominantly reddish cervical images, we used the blue channel from the RGB channels to extract features. After feature extraction, we compared the cervical labeling performances of different machine learning models (LR, SVM, RF, and XGB) trained on different mask images. One type of mask image contained only the acetowhite region, and the other type of mask image model included the acetowhite region and the surrounding 10 pixels along the margin of the lesion.

Among the five features with the highest importance selected for both mask images, both mask images had four features in common: small-area emphasis, normalized size zone Non-Uniformity Normalized, 90th percentile, and skewness. In both the mask images containing only the acetowhite region and those that included the surrounding 10 pixels, the small-area emphasis had the highest importance. Small-area emphasis is a feature extracted from the GLSZM that represents the distribution of areas with the same grayscale. The greater emphasis on small areas indicates that the image has a smaller-sized zone and finer texture. As cervical cancer advances, more white areas and diverse patterns appear in the acetowhite regions of cervical images, resulting in smaller areas. This is likely why small-area emphasis was found to have the highest importance. Size Zone Non-Uniformity Normalized (SZNN) also showed high importance in cervical image classification on both mask images. SZNN is a radiomics feature that quantifies the variability in the size of homogenous regions within an image, normalized to account for differences in image dimensions or total number of zones. Specifically, it measures how uniformly or non-uniformly areas of similar intensity (referred to as “size zones”) are distributed throughout the image. A lower SZNN value indicates that the size zones are more uniformly distributed, whereas a higher value reflects greater non-uniformity in the distribution of these zones. The SZNN ensures that the metric can be compared across images of different sizes. As cervical cancer lesions become severe, the acetowhite areas would show more non-uniformity, as there would be some parts that have irregular surfaces, eventually leading to a higher SZNN. The 90th percentile is often used as a statistical feature to describe the distribution of the pixel intensity values or texture features in medical images. A higher 90th percentile value might indicate areas of high intensity or density, which could correlate with aggressive tumor characteristics, in this case, acetowhite lesions. As cervical cancer is more severe, the 90th percentile value increased. Skewness is a statistical measure that quantifies the asymmetry in the intensity distribution of pixel values within a defined region of interest (ROI) in medical imaging. Positive skewness indicates severe lesions, negative skewness indicates lesions that are less severe than positive, and a skewness equal to or near zero indicates healthy tissues. Because the SZNN, 90th percentile and skewness are values that statistically measure texture, we know that texture plays an important role in cervical cancer grading using acetowhite lesions.

We conducted a comparative experiment to determine which of the two mask images was more appropriate for cervical cancer grading. As the goal of this model is to help those with cervical cancer get diagnosed appropriately and receive appropriate treatment, precision, and accuracy were used to compare the performance of the models. The models trained using mask images that included both the acetowhite region and the surrounding area showed better overall performance. The p-value of the models trained with the expanded acetowhite mask is 0.634. This suggests that when using the mask image that includes the acetowhite areas and its surrounding areas, the performance differences between models are not statistically significant. Among the four models trained on the same dataset, the SVM model demonstrates the highest performance, achieving the best performance with a Precision of 0.87 (95% CI: 0.78–0.95) and an accuracy of 0.82 (95% CI: 0.76–0.88). This can be attributed to the focus of the study on binary classification into two categories (Atypical and Positive), where the SVM’s ability to classify data into distinct categories proved advantageous. The area mask image that includes 10 pixels of margin allows the model to identify the characteristics of the acetowhite lesion margin. Low-grade lesions tend to have irregular, feathery, geographic, or angular margins with fine punctations and mosaics, whereas high-grade lesions have regular and well-demarcated borders24. As features of the margins of the acetowhite lesion and how the margins of the lesions are demarcated from tissues that do not show acetowhiteness differ according to the grade of acetowhiteness, adding 10 pixels along the margin would contribute to enhancing the performance of the models.

Despite the widespread use of the acetowhite test by clinicians for diagnosing cervical cancer, research on the correlation between acetowhite regions and cervical cancer progression is relatively scarce. This study is significant because it highlights the strong correlation between the acetowhite regions and cervical grading, providing new insights into their diagnostic value.

When diagnosing cervical lesions, colposcopists identify acetowhiteness areas, but this visual examination process is inherently subjective, leading to variations in accuracy depending on the colposcopist’s proficiency12. To solve this problem, our model used classified data from 3 different colposcopists and machine learning methods in order to seek ways to use the visual data in classification with an objective view.

Several classification models for cervical cancer, such as CerviCare and others, have demonstrated superior performance compared to our model. However, these models utilize additional data beyond the acetowhite areas, such as full-image analysis and patient demographics (e.g., age, exposure to cigarettes). For example, CerviCare used full-image data for training and YOLO Fastest used additional patient demographics for model assessment17,18. Although these models showed superior performance, as the image data are not focused on acetowhite areas, which physicians use for classification, it would not be sufficient to suggest the standardized model for classification using acetowhite regions. In contrast, our study solely relies on image data specifically focusing on acetowhiteness areas and their surrounding regions. This approach proposes a novel utilization of acetowhiteness areas in image-based diagnosis without incorporating external patient information.

However, this study has limitations in that it did not consider additional clinical information used by clinicians for cervical grading. Information such as patient age, cervical screening results, and Pap test outcomes were not incorporated into the model. Future studies should explore the integration of these patient data with cervical images for more comprehensive modeling. Moreover, future research could compare data extracted from the green and blue channels and from the blue and gray channels or analyze the contrast in the blue channel across the entire cervical image, including the acetowhite region. Another potential direction could be to compare colposcopic images taken before and after the application of acetic acid to analyze the relationship between acetowhiteness and cervical cancer stages.

This study demonstrates the importance of considering not only the acetowhite region but also its surrounding areas when classifying cervical cancer stages based on acetowhite images. This finding suggests a more efficient approach for utilizing acetowhite images in future research, thereby providing a valuable foundation for further exploration.