Introduction

The automatic detection of craters is a fundamental task in planetary science and has significant implications for geological analysis1, spacecraft navigation2, and planetary surface exploration3. The identification of craters is essential for spacecraft navigation in hazardous terrains for exploring planetary resources. Traditional crater identification methodologies primarily rely on manual visual interpretation, which is labour-intensive, time-consuming, and susceptible to subjective biases. Crater detection methods can be broadly categorised into traditional computer vision and machine learning4 and deep learning-based approaches5. Traditional computer vision methods typically rely on manually designed features such as edges, contours, and shaded regions, and employ template matching and machine learning classifiers6. However, these methods face substantial limitations when dealing with craters of varying sizes and morphological characteristics7.

Deep learning methods eliminate the need for manual feature extraction by learning data representations automatically8, which significantly enhances the accuracy and adaptability of crater detection9. Deep learning has revolutionised computer vision, enabling significant advances in tasks such as image classification, object detection, and segmentation8. Deep learning models are capable of autonomously learning and extracting relevant features directly from image data10. For example, Silburt et al.11 applied U-Net to lunar data sets and achieved high levels of precision in their attempts to detect craters. Yang et al.12 used high-resolution feature pyramid networks for the detection of small-scale craters, which enabled the crater detection task to be effective for small targets. Deep learning models have demonstrated superior accuracy and generalisation capability when applied to large-scale, high-resolution remote sensing data sets13.

Deep learning models such as convolutional neural networks (CNNs)14 are widely used in visual recognition tasks such as image classification, face recognition, image denoising, and scene recognition. Hence, CNNs with semantic segmentation models have been used for high-resolution identification and characterisation of craters from imagery11. Furthermore, CNN variants such as U-Net15 and Faster R-CNN16 have been fine-tuned to work well even with the constant change in sizes, shapes, and degradation states of the craters17. Out of which, CNN-based models such as YOLO (You Only Look Once)18 have been used for real-time object detection with high accuracy.

Residual networks (ResNet)19 have been prominent for classification tasks and are often used as the backbone feature extractor for YOLO. ResNet and its variations (such as ResNet-50) are an enhanced deep CNN featuring the addition of residual connections that eliminate the common problem of gradient vanishing in conventional CNNs. This can achieve a deeper network structure and greatly increase feature expression capability and accuracy in recognition. It is quite common for researchers to employ ResNet-50 in image classification and feature extraction networks for target detection and image segmentation due to its deep residual architecture, which enables effective training of deep networks while mitigating vanishing gradient issues20. It also performs quite well in more complicated visual tasks, such as medical diagnoses21 and industrial applications22. Since the crater shape varies and has fuzzy boundaries, traditional CNNs have difficulty in extracting complex features effectively. There are several studies where deep learning and machine learning models have been trained using data from NASA (National Aeronautics and Space Administration) for crater detection9,23. ResNet has the potential to do better in deep background interference due to its deep residual structure, hence a trend to improve robustness and accuracy of classification. Fast R-CNN16 serves as a two-stage target detection model, and provides an improvement by sharing the convolution feature and regressing candidate boxes to improve the accuracy and efficiency of target detection. It significantly covers moderately realistic scenarios with high precision, such as the applications of remote sensing-based target location24, automatic driving traffic sign recognition25, and medical focus detection tasks26.

Furthermore, machine learning paradigms such as transfer learning27 enabled models trained for low-resolution datasets to be quite effective in high-resolution datasets. Yang et al.3 utilised transfer learning for the study of the moon with the data from the Chinese Chang’E missions. Using Chang’E-1 and Chang’E-2 data11, Yang et al.3 applied deep and transfer learning to identify over 117,000 lunar craters and estimated ages for nearly 19,000. The authors utilised a two-stage CNN-based model for multiscale detection and age classification by integrating morphological and stratigraphic features with high accuracy. Undoubtedly, these models would not just continue to expand and sharpen the catalogues of craters, but also provide a background on the impact history and geological development necessary to optimally conduct space exploration and mission planning.

A major difficulty in crater detection is the problem of data labelling because quality labelled datasets are relatively rare; thus, the good old manual annotation is laborious and costly. The other problem is scale variation, since the diameter of craters is in the range of a few metres to a few hundred kilometres, it becomes imperative that the models should be able to handle very different scales effectively. Finally, topographic complexity has proven to be a stumbling block toward generalisation in the context of the different surface features of different planetary bodies4. Hence, models trained on one dataset do not perform well on others, especially across different planetary bodies, due to differences in data resolution, lighting conditions, and terrain features9 Limited computational resources and class imbalance also hinder model performance, especially when training deep networks or detecting rare crater types in unbalanced datasets. However, novel deep learning models such as ResNet and YOLO have the potential to address some of these limitations.

In this paper, we present a robust deep learning framework for automating crater identification and classification. Our framework features a two-stage process, whereby YOLO is used for detection, and a set of deep learning models is used for classification of different types of craters. Hence, we evaluate novel deep learning models (ResNet, CNN, and YOLO) for crater classification. We provide an annotated crater detection dataset, which is utilised for training the respective models for crater classification, such as large, small, and medium craters. The framework aims to enhance our understanding of planetary surface processes, which are two of the fundamental aspects crucial for critical tasks in space exploration, such as landing site selection and resource assessment. Finally, our framework enables us to generate a report that provides statistics about the types of craters in a given region. We consider selected regions for craters and identification from Mars and the Moon based on remote sensing data.

The rest of this paper is organised as follows. Introduction provides a background on crater exploration, and Methods introduces the methodology and includes data processing and modelling. In Results, we present the results, followed by its Discussion, explaining potential factors that influenced the outputs.

Methods

NASA dataset

NASA is a United States federal agency popularly known for managing space programmes and spearheading aerospace and space science research. NASA is well known for sharing data with national and international research institutions and has been at the forefront of space research globally.

The dataset utilised in this study consists of high-resolution satellite images of planetary surfaces, primarily focusing on Mars and Mercury. These images were sourced from NASA’s publicly accessible archives, which include data from missions such as the Mars Reconnaissance Orbiter (MRO)28, and the MESSENGER (MErcury Surface, Space ENvironment, GEochemistry, and Ranging) spacecraft29. These missions have provided extensive visual data of planetary surfaces, capturing a wide range of geological features. The dataset encompasses images with diverse resolutions, varying lighting conditions, and different surface characteristics, making it well-suited for training and evaluating deep learning models aimed at crater detection and classification tasks11.

Roboflow universe dataset

Roboflow Universe (https://universe.roboflow.com/) is a community-driven collaborative platform for the computer vision community to provide high-quality datasets30 and pre-trained models. Roboflow Universe assembles thousands of openly sourced, annotated, and collated datasets to enable users to circumvent the mundane process of data preparation and cleaning. Object detection, image segmentation, classification, and object tracking are examples of tasks supported. It further aggregates most of the widely used models, such as YOLO and Faster R-CNN. The pre-trained models can be downloaded by users or used to experience online for the model effects to enable fast prototyping and application deployment. The Roboflow Universe also has a full API interface and a simple Source Development Kit (SDK) to support quick integration and deployment of models for web, mobile, or edge devices, truly achieving one-stop computer vision development. Therefore, in this study, Roboflow will help in building a knowledge-sharing and project-sharing community among peers because this can be freely accessed by the data sets of lunar crater images formed or shared by others. The data of lunar craters on Roboflow Universe have been taken from different moon exploration missions and varied shooting conditions. The diversity of data richness can effectively improve the generalisation performance of the training model and enhance the robust nature of the model in the actual scene.

Manual Data Labelling

We analyse crater datasets from Mars and the Moon, which were obtained from NASA Planetery Data System31. The Mars region covers the mid-low southern latitudes (Latitude: -25.40° to -25.17°, Longitude: 196.23°E to 196.34°E), while the Moon region covers the western edge of the equatorial region on the near side (Latitude: -1.60° to -1.27°, Longitude: 318.04°E to 318.16°E). We categorised the craters into three size groups based on their diameter: small (<10 km), medium (10-50 km), and large (>50 km). The Mars dataset contains 135 small, 20 medium, and 16 large craters (171 total). The Moon dataset includes 611 small, 45 medium, and 15 large craters (671 total). We provide our labelled data using a GitHub repository (https://github.com/sydney-machine-learning/crater-identification).

Framework

We consider selected regions for craters and identification on Mars and the Moon. Our framework features a more advanced and intricate YOLO (namely YOLO-v11n variant of YOLO-v11) for crater detection. We use two deep learning models for large remote sensing image datasets. We highlight that conventional deep learning models, such as CNN for object detection and identification, take a fixed window size of the input image. This can be problematic in real applications, such as crater detection, where the object (crater) size varies and does not fit in a single input image. In such cases, an input image may feature incomplete craters, which would be difficult to identify. Moreover, we have a two-stage problem, where in the first stage, our framework needs to detect the crater, and in the second stage, it identifies the type of crater, e.g., small, medium and large craters, or incomplete craters. We refer to a relevant study to build an effective crater detection framework. Neubeck and Van Gool32, presented the Non-Maximum Suppression (NMS) method, which helps us remove overlapping boxes and improve detection accuracy. In this way, we handle the problem of different crater sizes and crowded regions in planetary images more effectively.

Figure 1 presents a framework comprising three deep learning models: CNN, YOLO, and ResNet-50, for crater detection and identification on Mars and the Moon. The first step features data acquisition and preprocessing with a dataset obtained from two different sources, including high-resolution images of the Martian surface from the NASA official website33 and Moon craters from the Roboflow public platform34. Due to differences in image sizes, we preprocessed each image separately. We cropped and resized the Mars images and then relabelled to adapt to the model’s input requirements. We padded the lunar images with black to be enlarged and then relabelled to have a uniform image size.

Fig. 1: Machine learning framework for crater detection and identification.
Fig. 1: Machine learning framework for crater detection and identification.The alternative text for this image may have been generated using AI.
Full size image

The framework features a two-stage process that requires two separate deep learning models, where YOLO (Step 4) is used in the first stage for detection. In the second stage (Step 5), we compare CNN, YOLO and ResNet for classification.

In the second step, we perform data visualisation and annotation correction to ensure data quality and validity. We carry out detailed data visualisation analysis on all processed data to observe the distribution characteristics of craters. Also, we pay special attention to the problems arising from manual annotation, such as large craters being incorrectly labelled as medium-sized ones. We then correct and relabel them one by one. In Step three, we report statistical insights to give an overview of the data, including the distribution of the different types (classes) of craters. Step four involves conducting YOLO model training for the detection of craters using a two-class approach, i.e., crater vs background. To facilitate the ability of YOLO to process large images, we design two different prediction strategies:

  • Direct prediction method: We input the high definition 4K resolution image dataset into the YOLO model as a whole for detection. This approach has obvious advantages since it is able to carry out efficient and accurate detection of large-sized craters without any need for additional overlapping detection boxes.

  • Sliding window prediction method: We breakdown image into small regions (chips) of 640x640 pixels for finding individual craters with a 30 percent overlap of regions so that no targets are missed. This can increase the recognition accuracy of small-sized craters, but it has obvious deficiencies in large-sized crater detection if not captured within a chip. We further adopted the Non-Maximum Suppression (NMS)32 technique, which removes redundant detected boxes created by the sliding window method, therefore, efficiency and accuracy in the end-to-end prediction.

In the fifth step, we focus on crater classification involving three classes (large, medium, and small craters). We compare CNN, ResNet and YOLO in order to find which model suits the task. We implemented a specialised data loading and preprocessing process for the CNN model since in the YOLO dataset format, non-crater images were separately labelled. In the sixth step, we evaluate the given models using F1-score, precision, recall, and accuracy metrics.

CNN model

In planetary science, deep learning models have been increasingly adopted for tasks such as crater detection, terrain classification, and rock identification11. Traditional methods for crater detection, which often rely on hand-crafted feature extraction techniques such as edge detection and template matching, struggle with challenges including noise, varying illumination, and the diverse morphology of craters35. In contrast, deep learning models, particularly CNNs, have demonstrated the ability to automatically learn relevant features from raw data, making them highly effective for crater identification tasks36. CNNs can automatically learn to detect features such as crater rims, shadows, and textures, which are critical for distinguishing craters from other geological formations37. The ability of CNNs to generalise from large datasets has made them a popular choice for planetary science applications, where manual feature extraction is often impractical38. For example, Silburt et al.11 employed CNNs to classify lunar craters based on their morphological features, achieving robust performance even in complex terrains. Similarly, the YOLO model has been utilised for real-time crater detection in planetary exploration missions, offering a balance between speed and accuracy39. These advancements highlight the potential of deep learning to revolutionise planetary science by automating and improving the accuracy of crater detection and analysis. However, CNNs require large amounts of labelled data for training, and their performance may degrade when applied to datasets with significant domain shifts or limited annotations.

Figure 2 is our process using the CNN model as a benchmark model for the classification task of crater images. To compare the performance of more advanced models, such as YOLO and ResNet-5019, based on a data set in YOLO format, we crop the crater region in the original image into a single image of 128 × 128 pixels. We also introduce the “non-crater” category to construct a four-classification task. The data is organised in structured training, validation, and testing directories, and loaded and enhanced by the PyTorch Dataset class40 and the Keras ImageDataGenerator41. This includes pixel value scaling, random rotation, and horizontal flip operations to enhance the robustness and generalisation ability of the model. The data is then formatted for the model input and output, pixel values of the images and their labels converted into tensor forms for faster computation, also scaled by a data scaler, and finally transformed into a tuple containing both the image and its label for the convenience of the users of the model. The input shape for the model is determined to be 128 × 128 × 3 by the size of the images, which are standardised to be 128 pixels in height and 128 pixels in width with three colour channels. The model was trained with Adam optimiser for 30 rounds and categorical cross-entropy used as loss function42. The final results were evaluated by accuracy, loss, classification reports, and confusion matrix.

Fig. 2: CNN architecture.
Fig. 2: CNN architecture.The alternative text for this image may have been generated using AI.
Full size image

A series of convolutional and pooling layers is used to process the input crater image after detection for the classification of crater type.

YOLO model

Unlike traditional two-stage detectors, the YOLO model18 predicts bounding boxes and class probabilities directly from the input image in a single forward pass, enabling real-time performance43. This efficiency makes YOLO particularly suitable for applications requiring fast processing, such as onboard spacecraft systems or large-scale planetary image analysis44. However, the single-stage design of YOLO can sometimes result in lower precision for small or densely packed objects, which may require additional optimisation for specific tasks such as crater detection45.

Figure 3 is our process using the YOLO method. We all have different priorities and ways of using YOLO. In the first focus, YOLO primarily serves as a data annotation tool since the original dataset is first annotated in YOLO format, because more traditional CNN models cannot directly process YOLO data with a background (non-crater) category. An image marked by YOLO format is then cropped to show a single crater and resized to 128x128 pixels suitable for CNN training. Here, YOLO will mainly undertake accurate positioning information and data preprocessing; The second focus goes directly into the application of the YOLO model itself, wherein not only the use of the latest version of YOLO (YOLO-v11) for training and classification detection of crater image data, but also for large, medium, and small craters of different sizes. Also, two strategies for predicting ultra-large 4K resolution planetary surface images are offered; one where the complete large image is input to YOLO for fast and efficient large impact crater detection, and the other where the large image is gradually segmented into several 640x640 small regions using a sliding window46. After making individual predictions for each region, the following two strategies will be proposed: the Non-Maximum Suppression (NMS) methodology for consolidating overlapped detection boxes32, hence raising small crater detection precision and accuracy by a large measure. Thus, in general, the YOLO in the first focus mainly acts as data pretreatment and gives CNN training data, while the YOLO in the second focus serves as the principal model for extensive multi-scale crater detection applications. And via the well-designed foretell strategy and post-processing means, the advantages and strong applicability of YOLO in real deep learning applications are demonstrated.

Fig. 3: YOLO architecture.
Fig. 3: YOLO architecture.The alternative text for this image may have been generated using AI.
Full size image

A series of convolutional and pooling layers is used to process the input crater image for detection in Step 4 of the framework (Fig. 1).

ResNet model

ResNet-50 is a 50-layer implementation of the ResNet (ResNet) architecture, which uses skip connections to help mitigate the vanishing gradient problem in deep networks36. These connections allow the network to learn residual mappings, enabling the training of very deep architectures without degradation in performance47. ResNet-50 has demonstrated exceptional results in image classification tasks19, particularly in scenarios requiring the recognition of complex patterns, such as crater identification48. Using pre-trained weights from large datasets such as ImageNet, ResNet-50 can achieve high accuracy even with limited planetary data, making it a powerful tool for crater detection and classification49. However, the computational cost of ResNet-50 can be high, and its performance could be limited when applied to very high-resolution images or datasets with significant class imbalance.

The impact crater identification is articulated through the ResNet-50 deep learning model, as shown in Fig. 419. An image is first preprocessed using YOLO marker coordinates, then the area of the crater is cropped and fixed at 128 × 128 pixels. Next, real-time data augmentation is performed by Keras’ ImageDataGenerator: to improve the generalisation of the model. The model itself is based on a pre-trained ResNet-50; only the top layer structure is customised. Global average pooling and batch normalisation, a full connection layer with L2 regularisation, and a Dropout layer are used to reduce the overfitting risk. The real business data is from OUC and has about four types of targets to classify by the Softmax in the output layer. We ignore the non-craters for analysis. In the training process, we use the Adam optimiser and cross-entropy loss function42 to monitor the performance of the model in real-time. The final results show the accuracy and stability of the model based on 30 experimental statistical evaluations.

Fig. 4: ResNet model architecture.
Fig. 4: ResNet model architecture.The alternative text for this image may have been generated using AI.
Full size image

A series of convolutional and pooling blocks is used to process the input crater image for classification in Step 5 of the framework (Fig. 1).

Technical details

Our crater discovery system unites three models: YOLO-v1150 for spotting objects and ResNet-50, plus a usual CNN for labelling craters. In the case of YOLOv11, we take a project-included model and begin training from zero. While ResNe50 comes with ImageNet pre-trained weights and during training19, all layers of convolutions are stopped. That is, our own CNN model is built by hand and trained from scratch without any pre-trained weights. These models are used to cross and compare the performance between the discovery and labelling tasks.

We implemented the framework using Python, employing TensorFlow Keras41 for the models in CNN and ResNet-50, and the Ultralytics library for YOLO-v11, with supporting help from NumPy51, OpenCV52, Matplotlib, Seaborn, and Plotly. The model training was done on a workstation that possesses an NVIDIA GeForce RTX 4060 GPU and 16 GB of memory.

A custom dataset named datasets was created, comprising three crater categories: large, medium, and small. These data come from Mars and the Moon, respectively. We created two versions of the dataset as required by YOLO for object detection and in a cropped format (based on YOLO bounding boxes) for classification models. It was further divided into training (60%), validation (10%), and test (30%) sets, with the following number of samples in each for the case of Mars craters as shown in Table 1. Table 2 presents the distribution of the craters for the case of the Moon.

Table 1 Distribution of craters by size in the dataset of Mars
Table 2 Distribution of craters by size in a selected region of the Moon

In all experiments, we conduct thirty independent experiments and report the mean and variance of the results.

For classification tasks, we used data augmentation via the Keras ImageDataGenerator. This included rescaling the data and implementing other standard parameters, such as random rotation (±15°), horizontal flipping, and width and height shifting (±10%). Train the YOLOv11 model with a batch size of 32 and image resolution set to 640 × 640, for 200 epochs. Train CNN and ResNet-50 models for 30 epochs with a batch size of 59 and input resolution set to 128 × 128.

Results

Experiment design

We plan to evaluate and compare the performance of three deep learning models -YOLO, CNN, and ResNet-50, in crater detection and classification tasks. The experiments follow a phased design approach, utilising NASA’s Mars crater dataset and processed lunar crater images from RoboFlow. All crater annotations adhere to standard specifications and are categorised into three classes based on size: large craters (>50 km), medium craters (10–50 km), and small craters (<10 km).

The research data comprises MRO HiRISE high-resolution images on Mars from NASA and lunar crater image data from the Roboflow platform. The original size and quality of the two types of data are very large. We crop and scale the Mars image, such as Figs. 5 and 6 to make it the right size for model training and re-label it after cropping.

Fig. 5: Crop and scale operation.
Fig. 5: Crop and scale operation.The alternative text for this image may have been generated using AI.
Full size image

Crop and scale the Mars image (Latitude: –25.40° -- −25.17°, Longitude: 196.23° -- 196.34°E) for further processing using our machine learning framework. Image data obtained from the Mars Reconnaissance Orbiter (MRO)28.

Fig. 6: Manual data labelling.
Fig. 6: Manual data labelling.The alternative text for this image may have been generated using AI.
Full size image

Hand-marked craters on Mars (Latitude: −25.28°, Longitude: 196.29°E). Image data from the Mars Reconnaissance Orbiter (MRO)28.

In the YOLO-v1150 experiment, input images are resized to a resolution of 640 × 640 pixels, and the model is trained using the standard YOLO bounding box annotation format. The model architecture fully incorporates the YOLO framework’s backbone, feature fusion layers, and detection head components. We train the model with the Adam optimiser (initial learning rate = 0.001) for 30 epochs, with an early stopping mechanism to prevent overfitting. For large-scale image processing, we test two methods: direct prediction and a sliding window approach. The latter employs a 30% overlap between windows to ensure detection continuity and uses Non-Maximum Suppression (NMS) to eliminate redundant bounding boxes.

The second experiment aims to compare the performance difference between traditional Convolutional Neural Networks (CNNs) and the YOLO framework. We use the same dataset as in Experiment 1, but convert it into a format suitable for CNN training. Individual craters are extracted based on YOLO annotations and resized to 128 × 128 pixels. Training is conducted using the categorical cross-entropy loss function and the Adam optimiser (learning rate = 0.001) for 30 epochs. To prevent overfitting, ModelCheckpoint and EarlyStopping mechanisms are implemented41. In the third experiment, we introduce the ResNet-50 model to verify the performance of a deeper neural network in the crater detection task.

Results

We observe that the average rank of the three models is as follows: CNN (1.67), YOLO (1.84), and ResNet-50 (2.50), as shown in Tables 3 and 4. Detection rankings for the three categories report that YOLO performed best in large crater detection, and CNN dominated in small crater identification. CNN and YOLO models achieved the same average ranking on Mars data (1.67), while CNN led on lunar data (1.67 vs 2.00 for YOLO). The subsequent detailed analysis will provide an in-depth analysis of these reasons by observing the precision-recall of each model.

Table 3 Performance (Rank) of different models for crater detection of Mars
Table 4 Performance (Rank) of different models for crater detection of the Moon

First, we use data from Mars and the Moon for CNN-based crater classification and perform data preprocessing on the data to convert it into a format that is suitable for CNN. Additionally, we use ModelCheckpoint and EarlyStopping during model training to ensure good model prediction accuracy.

After training the CNN model using the collected crater data, the classification performance reported in Tables 5 and 6 reveals a significant performance issue in the model’s ability to recognise different crater categories. The results demonstrate that Category 1 (small craters) performs exceptionally well, achieving an F1-score of around 0.97, indicating almost no misclassifications in detecting small craters, with near-perfect precision and recall rates. Figure 7 presents the Precision-Recall (PR) curves of the CNN model for a typical model training run. We find that Class 1 has the largest AUC, indicating the best performance out of the three classes. Figure 8 presents the training losses of the CNN model for a typical model training run, where the training loss consistently decreases over time. Similarly, validation loss decreases gradually for Mars but less so for the Moon. Note that the plots end at epoch 12 and 18, respectively, due to the early stopping.

Fig. 7: CNN model training for understanding class imbalance.
Fig. 7: CNN model training for understanding class imbalance.The alternative text for this image may have been generated using AI.
Full size image

The PR curves for the CNN model in a typical model training run show that Class 1 performance is distinctly different from the others.

Fig. 8: CNN model training.
Fig. 8: CNN model training.The alternative text for this image may have been generated using AI.
Full size image

Training losses of the CNN model for a typical model training run, where the training loss consistently decreases over time. Similarly, validation loss decreases gradually for Mars but less so for the Moon.

Table 5 CNN report for crater detection on Mars (Mean ± Std)
Table 6 CNN report for crater detection on the Moon (Mean ± Std)

In terms of overall results (Tables 5 and 6), the model achieves a weighted accuracy of 0.91 on the Moon and 0.82 on Mars, which appears strong at first glance. However, its macro-average F1-score is only 0.57 and 0.56, highlighting a critical issue: while the model excels in classifying the majority class (small craters), its performance varies significantly across categories. This discrepancy likely stems from severe class imbalance in the training data, where small craters dominate while large craters and non-crater regions are underrepresented. In conclusion, although the CNN model performs exceptionally well in classifying small craters, its performance in other categories requires further improvement.

Next, we use the YOLO-11 model to train for detecting craters in three categories: large, medium, and small craters. Figures 9 and 10 show the crater data before and after annotation. According to the YOLO model, based on results from 30 independent model training runs, its performance varies with the size of the crater. As shown in Tables 7 and 8, we can find that the detection of large craters (Class 0) has the highest recall rate at 0.78 ± 0.09(detection on Moon) and 0.90 ± 0.06(detection on Mars) with relatively low accuracy, which may imply that there could be more false alarms in the detection of this category. The performance of Middle Crater (Class 2) was the most balanced since it registered higher accuracy and recall, as well as the best overall performance. The small crater (Class 1) got the lowest F1-score of 0.63 ± 0.01(detection on Moon) and 0.67 + 0.03(detection on Mars), which means the identification of small targets is hard. On the whole, model detection is somewhat more stable when it comes to medium-sized targets. Results indicate that the YOLO model has high accuracy and stability for large crater detection, while the performance for small craters is a little bit not quite sufficient.

Fig. 9: Mars study area.
Fig. 9: Mars study area.The alternative text for this image may have been generated using AI.
Full size image

NASA raw Mars data map (Latitude: −25.40° --−25.17°, Longitude: 196.23° -- 196.34°E). Image data from the Mars Reconnaissance Orbiter (MRO)28.

Fig. 10: Mars crater classification results.
Fig. 10: Mars crater classification results.The alternative text for this image may have been generated using AI.
Full size image

Sliding window statistics (Latitude: −1.60° -- −1.27°, Longitude: 318.04° -- 318.16°E). Image data from the Mars Reconnaissance Orbiter (MRO)28. The colours indicate different types of craters: blue (large craters), red (medium craters), and green (small craters).

Table 7 YOLO report for crater detection on the Moon (Mean± Std)
Table 8 YOLO performance for crater detection on Mars (Mean ± Std)

Figure 11 presents the PR curves of the YOLO model for a typical model training run. The average precision at Intersection over Union (IoU) = 0.5 of the model for Mars is evidently better than for the Moon. Figure 12 presents the Mean Average Precision (mAP) curve (mAP@0.50–0.95) of YOLO model for a typical model training run, gradually increasing throughout the 200 epochs for both Moon and Mars. We average all the 10 IoU thresholds and take the mean over all classes. For both Mars and the Moon, the model finishes with mAP between 0.6 and 0.7, indicating that the localisation quality of the model may not be desirable despite good detection Precision from earlier PR curves (Fig. 11).

Fig. 11: YOLO classification of Mars and the Moon.
Fig. 11: YOLO classification of Mars and the Moon.The alternative text for this image may have been generated using AI.
Full size image

PR curves of the YOLO model for a typical model training run. The average precision (mAP at IoU = 0.5) of the model for Mars is evidently better than for the Moon.

Fig. 12: YOLO mAP model training performance.
Fig. 12: YOLO mAP model training performance.The alternative text for this image may have been generated using AI.
Full size image

mAP@0.50–0.95 of YOLO model for a typical model training run for both the Moon and Mars.

Finally, we use the ResNet-50 model to train and detect craters from the Mars and Moon data (Tables 9 and 10). The model shows coherent classification performance over 30 independent model training and test runs. ResNet-50 works really well for detecting small craters, with good scores across precision (0.95 ± 0.03 for Moon and 0.80 ± 0.05 for Mars), recall (0.99 ± 0.01 for Moon and 0.95 ± 0.16 for Mars), and F1-Score (0.97 ± 0.01 for Moon and 0.86 ± 0.10 for Mars). This shows it’s great at spotting this type of crater. However, when it comes to medium and large craters, the results are mixed. Although the Precision score is good, the very low Recall and F1-scores tell us the model struggles to reliably find these bigger craters. This model often misses medium and large craters, but when these two types of craters are detected, it’s usually correct. The model performs very well for small craters but has trouble consistently detecting the larger ones. Finally, Figure 13 presents PR curves of the ResNet-50 model for a typical model training run, where class 1 has the largest AUC, indicating the best performance out of the three classes.

Fig. 13: Understanding class imbalance with ResNet model training.
Fig. 13: Understanding class imbalance with ResNet model training.The alternative text for this image may have been generated using AI.
Full size image

PR curves of the ResNet model for a typical model training run for the case of Mars and the Moon.

Table 9 ResNet-50 performance for crater detection on the Moon (Mean ± Std)
Table 10 ResNet-50 performance for crater detection on Mars (Mean ± Std)

Comparison of models

We compare how well different models perform for crater detection; hence, we ran tests using CNN, YOLO, and ResNet-50 on the same dataset with consistent evaluation metrics. The CNN model demonstrates strong performance in small crater detection with an F1-score of 0.97 ± 0.01, while showing moderate capability for medium craters (F1-score: 0.65 ± 0.06). However, its performance on large craters remains limited (F1-score: 0.70 ± 0.05), suggesting challenges in capturing broader contextual features.

YOLO exhibits the most balanced performance across all crater sizes, achieving F1-scores of 0.82 ± 0.04 (large), 0.81 ± 0.05 (medium), and 0.78 ± 0.03 (small). This consistent performance highlights YOLO’s effectiveness in handling multi-scale detection tasks, likely due to its integrated feature pyramid network and anchor-free detection mechanism. ResNet-50 shows excellent performance in small crater recognition (F1-score: 0.95 ± 0.02) and has significantly improved capability for large craters (F1-score: 0.74 ± 0.06) compared to previous implementations. The model’s medium crater detection (F1-score: 0.72 ± 0.05) benefits from its deep residual learning architecture.

Discussion

We systematically compared the performance of three deep learning models, including CNN, YOLO, and ResNet-50, in crater detection tasks. We examined the characteristic differences of various model architectures when addressing multi-scale object detection. Our results demonstrate that the YOLO model shows significant advantages in overall performance balance, which is due to its unique mechanism for combining global and local features. We also find that the CNN model performs well in detecting small craters, achieving an F1-score of 0.98 ± 0.00, which is closely related to its design focus on local feature extraction. Although the ResNet-50 model performs well in detecting small craters, its ability to identify large craters is inadequate, which reflects the inherent challenges in deep neural networks when handling multi-scale targets.

The YOLO model demonstrates that the multi-scale adaptability is valuable; it achieves an F1-score of 0.70 ± 0.08 in detecting large craters and 0.76 ± 0.05 for medium craters. This indicates that this architecture can effectively capture feature information of targets at different scales. In contrast, while the traditional CNN model10 offers higher computational efficiency, its performance in handling large-scale targets is inadequate, which is related to its limited design. The performance of the ResNet-50 model shows that simply increasing model depth does not always lead to comprehensive performance improvements, especially when addressing domain-specific tasks.

The impact of class imbalance on model performance is a critical issue observed in the experimental results. Due to the predominance of small crater samples in the dataset, the performance of these three models in detecting medium and large craters is constrained to varying degrees. This phenomenon matches recent related studies4, which highlight the widespread issue of uneven sample distribution in planetary surface feature detection. Otherwise, the hybrid strategy combines sliding windows with NMS, which effectively improves the detection accuracy of small craters, but this strategy also increases computational resource consumption. This trade-off between performance and efficiency requires careful consideration in practical applications.

The primary contribution of this study lies in establishing a complete evaluation framework for deep learning models, ensuring the reliability of the conclusions through strictly controlled experimental conditions and repeated validation. The optimisation of the sliding window strategy significantly improves the detection accuracy of small craters, providing an important reference for subsequent research. Additionally, the class imbalance issue revealed by the study points the way for future improvements in data collection and annotation. These findings not only offer direct guidance for feature detection research in planetary science but also provide valuable insights for the development of multi-scale object detection algorithms in the field of computer vision.

In terms of limitations, we note that the region selected is small and hence there were challenges in model training. We used a small region to demonstrate the effectiveness of the models, and future work can review a larger region. Future research can focus on addressing several key issues identified in this study. At the model level, more efficient architectural designs are needed to balance detection accuracy and computational cost. At the data level, smarter sample augmentation techniques should be developed to decrease class imbalance. At the application level, cross-planetary data transfer learning solutions should be explored to enhance the applicability of the model. Breakthroughs in these areas will significantly advance the development of automated planetary surface analysis technologies, providing more robust technical support for deep space exploration missions. In the future, we will continue with the in-depth research of more complex network architectures, like ResNet-50, and take into account advanced fusions such as model fusion and transfer learning53 to further advance the accuracy and universality on crater detection, subsequently leading to the refined development of space exploration and geological research.

In terms of limitations, we note that the region selected is small and hence there are challenges in model training. We used a small region to demonstrate the effectiveness of the models, and future work can review a larger region.

We presented a robust deep learning framework for automated crater classification using a two-stage process, deploying the three models: CNN, ResNet, and YOLO. The results demonstrated the effectiveness of these models in identifying craters between three classes of craters. After cropping and scaling the images, we used the YOLO model to detect the location and boundaries of the different craters, and labels were produced for each processed image. Applying the YOLO model again, along with CNN and ResNet-50, we classify the crater sizes and evaluate appropriate metrics, taking into account class imbalance and identify which model is best suited for the task. The results show that the CNN model demonstrates near-perfect recognition accuracy in small crater detection tasks but performs quite poorly for large and medium craters, especially for the Moon dataset. Although this may be explained by the severe class imbalance between the small craters and the larger craters, YOLO demonstrated a balance for the detection of multiscale craters across all three classes while maintaining high computational efficiency. This could be possibly due to a unique feature extraction mechanism and a hybrid prediction strategy. Although CNN was the best for small crater identification, YOLO surpassed it in the larger crater category. This reflects that only increasing network depth does not necessarily lead to overall improvements in detection performance, particularly when dealing with multi-class detection tasks with imbalanced sample distributions.