Background & Summary

Hallux valgus is a common foot deformity in medical practice, often accompanied by significant functional impairment and foot pain. The prevalence of hallux valgus in females is 2.3 to 4.7 times that in males1,2. Accurate, reliable, and reproducible measurement of hallux valgus angle (HVA) and intermetatarsal angle (IMA) is essential for diagnosing hallux valgus and determining appropriate clinical surgical treatments3,4,5. Manual measurement methods for measuring HVA and IMA were standardized by the ad hoc committee of the American Orthopaedic Foot & Ankle Society (AOFAS)6,7. These standardized methods have been demonstrated to be reliable within a 5° margin. However, the potential for bias among different evaluators remains a challenge and manual methods are both time-consuming and labor-intensive.

Recent advancements in deep learning have revolutionized medical imaging tasks such as object detection and segmentation8,9,10,11,12,13. Several attempts have also been made to apply deep learning to hallux valgus angle estimation, and current methods typically approach the problem as a segmentation task followed by linear regression14,15,16 by using a cropped foot as input. This involves segmenting line segments representing the great toe, the first metatarsal, and the second metatarsal using segmentation networks. Subsequently, linear regression is applied to compute the HVA and IMA. Despite these innovations, training these models requires large, well-annotated datasets. This is particularly challenging in the medical field due to the sensitive nature of the data and the difficulty in making such datasets publicly available.

Existing studies on hallux valgus angle estimation are summarized in Table 1. Kwolek et al. pioneered the segmentation-based approach to HVA estimation but trained their model on only 30 images, focusing solely on HVA14. Xu et al. employed an Hourglass neural network trained on 230 X-ray images from 143 patients, with annotations that were not subjected to a rigorous annotation process16. Takeda et al. and Ma et al. conducted experiments on 1,798 and 2,000 images, respectively, to estimate HVA and IMA15. However, only 230 images from Xu et al. dataset are publicly accessible. Moreover, all these datasets are composed of cropped foot images, which limits their applicability to real-world clinical scenarios, particularly when an X-ray contains both feet or when the majority of the image is not occupied by a foot.

Table 1 Summary of datasets used in existing works on Hallux Valgus Angle Estimation.

To overcome these challenges, we introduce HVAngleEst, the first large-scale, open-access annotated dataset designed for developing hallux valgus angle estimation algorithms. HVAngleEst comprises 1,382 X-ray images collected from the Foot and Ankle Surgery Department, Honghui Hospital of Xi’an Jiaotong University, China. It includes annotations of hallux valgus angles, feet localization and line segments representing the great toe, the first metatarsal, and the second metatarsal, on full X-ray images. This dataset enables fully automated, end-to-end hallux valgus angle estimation, minimizing manual effort.

Methods

This study was approved by the institutional ethics committee of Honghui Hospital of Xi’an Jiaotong University,the relevant case number is 2025-KY-017-01. As the dataset was collected retrospectively and all sensitive information related to patients was anonymized, the ethics committee waived the requirement for obtaining informed consent from patients. The entire data labeling pipeline is illustrated in Fig. 1.

Fig. 1
figure 1

The workflow for creating the HVAngleEst dataset: (1) collected X-ray images and converted to JPG by giving arbitrary names. (2) Removed the invalid images and crop the images that contain personal information on the edges. (3) Performed statistical analysis on the data obtained in the stage 2. (4) Four orthopedic doctors annotated the images from the stage 2, they manually annotated boxes and endpoints of line segment. Subsequently, the HVA and IMA were calculated from endpoints.

Data collection and cleanup

The HVAngleEst dataset includes 1,587 feet across 1,150 patients, captured in 1,382 X-ray images. The dataset comprises 130 male patients and 1020 female patients, with 825 left feet and 762 right feet. Patient ages range from 17 to 83 years (mean ± standard deviation: 51.4 ± 15.0 years). Of the 1,382 images, 1,332 DICOM images were generated using Siemens device, while 50 images were photographs of printed X-ray films taken with camera (Canon or Nikon). This inclusion of both high-quality DICOM images and mobile-captured film photographs reflects real-world clinical practice, where patients often present with images of printed X-rays from external institutions and the original DICOM files are unavailable. To minimize financial burdens, clinicians may choose to utilize these secondary images rather than request new acquisitions. By incorporating both high-quality DICOM images and mobile-captured film photographs into training dataset, we aim to replicate realistic clinical workflows, ensuring that the AI algorithm can robustly handle multi-source imaging data with varying quality and provenance. Current research demonstrates that properly designed multi-modal training not only fails to confuse the AI algorithm but instead can force it to learn more robust and generalizable features17. All photographs in our dataset underwent strict quality control, and we explicitly labeled the image source in the “source” column of datasets.csv, allowing users to filter by modality (DICOM vs. camera) if desired.

The dataset contains 620 images of the left foot, 558 images of the right foot, and 204 images showing both feet. A total of 366 feet were labeled with the property “truncated”, indicating partial occlusion or extension beyond the image boundary. DICOM images were anonymized by assigning arbitrary names and converting them to BMP format, with sensitive metadata removed. Subsequently, BMP and camera images were converted to JPG images, and files were renamed using a Python script and all images were manually reviewed to ensure privacy. During the manual inspection, personal information, such as names and birth dates, was found on the edges of 578 images. To ensure patient confidentiality, these images were manually cropped to remove any personal information.

Image annotation

The X-ray images were annotated using the LabelMe tool (https://github.com/wkentaro/labelme.git)18, and the annotation process was performed by four orthopedic doctors with varying levels of experience: Labeler A (3 years), Labeler B (5 years), Labeler C (over 10 years), and Labeler D (over 20 years).

For foot localization labeling, the IoU (Intersection over Union) between bounding boxes labeled by Labeler A and Labeler B was evaluated. If the IoU exceeded 0.95, Labeler B’s annotation was accepted. Otherwise, Labeler C reviewed the task. If the IoU between the bounding boxes annotated by Labeler C and those by either Labeler A or Labeler B exceeded 0.95, Labeler C’s annotation was considered final. Otherwise, the task was escalated to Labeler D, whose annotation was considered definitive. Bounding boxes were categorized as “left” or “right” foot, and the “truncated” property was assigned if the foot was either partially occluded or extended beyond the image boundary.

HVA, IMA and line segments were annotated in accordance with the standards set by AOFAS4,5, as shown in Fig. 2. First, two parallel lines (red dashed lines) were drawn for the great toe, the first metatarsal, and the second metatarsal. Second, the center lines of each phalanx (blue solid lines) were drawn through the midpoints of the two parallel lines. Third, the intersection points of these center lines with the ends of each phalanx were identified and served as the endpoints for the corresponding phalanx line segments Finally, the HVA and IMA were calculated using the labeled endpoints of each phalanx. If discrepancies between HVA and IMA annotations by Labelers A and B were less than 1°, Labeler B’s annotation was selected. If errors exceeded 1°, Labeler C reviewed the task. If Labeler C’s annotations differed by less than 1° from either Labeler A or B, Labeler C’s annotations were accepted. Otherwise, Labeler D’s annotation was considered final. An example of images and their annotations from HVAngleEst is shown in Fig. 3.

Fig. 2
figure 2

The workflow for HVA, IMA, and line segments labeling: (1) Two parallel lines (red dashed lines) were drawn to annotate the great toe, the first metatarsal, and the second metatarsal. (2) The midpoints of these parallel lines were joined to create a straight line. (3) The points where this straight line intersects the edge of the phalanx defined the line segment points for that phalanx. The points (a, b), (c, d), and (e, f) correspond to the line segments of the great toe, the first metatarsal, and the second metatarsal. (4) HVA and IMA were calculated by the endpoints of each phalanx.

Fig. 3
figure 3

Examples of annotated images of one foot (top) and two feet (bottom). On the left is the de-identified original image; in the middle are the annotated boxes and line segments; and on the right is the fused image with the annotation results. The foot in the top image is labeled as “truncated” due to part of the foot being obstructed by the leg.

Data Records

The HVAngleEst dataset is publicly available for download via Science Data Bank (https://www.scidb.cn/en/s/FVFFnq)19 and can be accessed without registration. Figure 4 presents the dataset’s folder structure and file formats in detail.

Fig. 4
figure 4

The folder structure of the HVAngleEst dataset.

The root folder of the dataset is named “HVAngleEst”. It contains subfolders named “images”, “annotations”, “tools”, as well as a “datasets.csv” file. Figure 4 provides an overview of the folder structure.

The CSV file “datasets.csv” contains the following information. “image_id” is the unique identifier of the image; “patient_id” refers to the patient to whom the image belongs; ‘filename’ is the name of the X-ray image; “source” indicates the image source, either from DICOM database or camera; “image_width” and “image_height” are the width and height of the image; “boxes” provides the coordinates of the box in format of “XMin, YMin, XMax, Ymax”, which are expressed in normalized image coordinates; “labels” specifies the object category, specifying either “left” or “right.”; “properties” indicates whether the foot is truncated; “great_toe”, “first_metatarsal”, and “second_metatarsal” denote coordinates of the two endpoints for each phalanx, formatted as “X1, Y1, X2, Y2”, are also expressed in normalized image coordinates; “HVA” and “IMA” are the hallux valgus angle and the intermetatarsal angle.

The “images” folder contains radiographs of feet. Each image filename begins with “IMG” followed by a zero-padded six-digit unique identifier and ends with the “.jpg” extension.

The “annotations” folder includes two subfolders: “boxes” and “masks”. The “boxes” folder contains annotation files for the localization task, organized into three subfolders: “COCO,” “YOLO,” and “PASCAL VOC.” The “COCO” folder includes a “json” file “COCO_feet_det.json” formatted according to the COCO20 standard. The “YOLO” and “PASCAL VOC” folders contain “.txt” and “.xml” files, respectively, which are named after the corresponding image files and follow the standard data formats of YOLO21 and PASCAL VOC22. Additionally, the “YOLO” folder includes a “classes.txt” file listing the available categories for localization, and there are two categories named “left” and “right”; The “masks” folder contains “.png” files, named after the corresponding image files. Based on the endpoints we annotated, these line segment masks were generated with a line width of 4 which was experimentally validated to yield the best hallux valgus angle estimation results15,16. The values assigned to the great toe, the first metatarsal, and the second metatarsal were 1, 2, and 3, respectively, while the background was assigned a value of 0.

The “tools” folder contains Python notebooks designed for support dataset preparation and algorithm implementation. During the dataset preparation phase, the notebook “points2angle.ipynb” was used to calculate HVA and IMA from the annotated endpoints of the phalanxes. The notebook “data2yolo.ipynb” generated ground truth annotations in YOLO format for localization, while “data2mask.ipynb” was used to create customized line segments based on the annotated phalanxes endpoints. Additionally, the YOLO format ground truth was converted into either the PASCAL VOC or COCO format using the notebooks “yolo2voc.ipynb” or “yolo2coco.ipynb”, respectively. During the linear regression stage, “seg2angle.ipynb” was employed to compute HVA and IMA from the output of the segmentation model.

Under the “splits” folder, there are 3 CSV files named “train.csv”, “val.csv” and ‘test.csv”. Each of these files contains a list of images used for training, validation and test in the technical validation.

Technical Validation

To ensure the dataset’s suitability for training deep learning algorithms, both foot localization and segmentation tasks were trained using YOLOv11s23 and DeeplabV324 based MobileNetV325 respectively, and a linear regression module to estimate HVA and IMA.

The dataset was randomly divided into 70% (967) for training, 20% (275) for validation, and 10% (140) for testing, for both the localization, segmentation models, as well as linear regression task. The training was performed on a Windows system equipped with an NVIDIA GeForce RTX 4060Ti GPU with 16 GB of VRAM. The localization model was pre-trained using YOLOv11s on the COCO dataset, while the segmentation model was pre-trained using MobileNetV3 on ImageNet26. The localization model was trained for 50 epochs, whereas the segmentation model was trained for 200 epochs. For the localization task, the input size was 640. For the segmentation task and Linear Regression, each foot was cropped using the bounding box, and the shorter side was resized to 512.

Localization performance

For the localization task, the feet were detected with a precision of 0.997, a recall of 1.0, and a mAP (mean Average Precision) of 0.994 at an IoU threshold of 0.5 on the test set.

Segmentation performance

For the segmentation task, line segments of the great toe, the first metatarsal, and the second metatarsal were predicted with mean mIoU (mean Intersection over Union) of 0.538 and pixel accuracy of 0.993 on the test set. Table 2 presents the mIoU for each category.

Table 2 The mIoU for each phalanx in the segmentation task.

Linear regression performance

Table 3 presents the percentages of HVA and IMA with errors less than 3° and 5° on the test set. For errors less than 3°, the percentages were 94.4% for HVA and 79.4% for IMA. For errors less than 5°, the percentages were 96.9% for HVA and 93.8% for IMA.

Table 3 The percentages of HVA and IMA with errors below 3°, and 5° on the test set.

Future research could compare the accuracy of weightbearing CT scans and X-rays in evaluating hallux valgus deformity, thereby providing more evidence-based guidance for clinical decision-making27,28,29.