Deep learning framework for crater detection and identification on the Moon and Mars

Ma, Yihan; Guo, Jessie; Yu, Zeyang; Chandra, Rohitash

doi:10.1038/s44453-026-00036-x

Download PDF

Article
Open access
Published: 17 April 2026

Deep learning framework for crater detection and identification on the Moon and Mars

Yihan Ma¹,
Jessie Guo^1,2,
Zeyang Yu¹ &
…
Rohitash Chandra^1,2

npj Space Exploration volume 2, Article number: 19 (2026) Cite this article

1270 Accesses
Metrics details

Subjects

Abstract

Impact craters are among the most prominent geomorphological features on planetary surfaces and are of substantial significance in planetary science research. Lately, the rapid advancement of deep learning models has fostered significant interest in automated crater detection. In this paper, we apply advanced deep learning models for impact crater detection and identification using Convolutional Neural Networks (CNNs) and their variants, including YOLO and ResNet. We present a framework that features a two-stage approach, where the first stage employs YOLO for crater detection and localisation. In the second stage, our framework features crater classification using CNN, ResNet and YOLO. Therefore, we detect and identify different types of craters and present a summary report with remote sensing data for a selected region. We consider selected regions for craters and identification from Mars and the Moon based on remote sensing data. Our results indicate that YOLO demonstrates the most balanced crater detection performance, while ResNet excels in identifying large craters with high precision. However, ResNet reported poor performance for large and medium craters for both Mars and the Moon, while CNN achieved the best performance for small craters on Mars.

Automated Lunar Crater Identification with Chandrayaan-2 TMC-2 Images using Deep Convolutional Neural Networks

Article Open access 08 April 2024

Subsurface structure and impact process of Yilan Crater, northeastern China

Article Open access 17 April 2025

The Tharsis mantle source of depleted shergottites revealed by 90 million impact craters

Article Open access 03 November 2021

Introduction

The automatic detection of craters is a fundamental task in planetary science and has significant implications for geological analysis¹, spacecraft navigation², and planetary surface exploration³. The identification of craters is essential for spacecraft navigation in hazardous terrains for exploring planetary resources. Traditional crater identification methodologies primarily rely on manual visual interpretation, which is labour-intensive, time-consuming, and susceptible to subjective biases. Crater detection methods can be broadly categorised into traditional computer vision and machine learning⁴ and deep learning-based approaches⁵. Traditional computer vision methods typically rely on manually designed features such as edges, contours, and shaded regions, and employ template matching and machine learning classifiers⁶. However, these methods face substantial limitations when dealing with craters of varying sizes and morphological characteristics⁷.

Deep learning methods eliminate the need for manual feature extraction by learning data representations automatically⁸, which significantly enhances the accuracy and adaptability of crater detection⁹. Deep learning has revolutionised computer vision, enabling significant advances in tasks such as image classification, object detection, and segmentation⁸. Deep learning models are capable of autonomously learning and extracting relevant features directly from image data¹⁰. For example, Silburt et al.¹¹ applied U-Net to lunar data sets and achieved high levels of precision in their attempts to detect craters. Yang et al.¹² used high-resolution feature pyramid networks for the detection of small-scale craters, which enabled the crater detection task to be effective for small targets. Deep learning models have demonstrated superior accuracy and generalisation capability when applied to large-scale, high-resolution remote sensing data sets¹³.

Deep learning models such as convolutional neural networks (CNNs)¹⁴ are widely used in visual recognition tasks such as image classification, face recognition, image denoising, and scene recognition. Hence, CNNs with semantic segmentation models have been used for high-resolution identification and characterisation of craters from imagery¹¹. Furthermore, CNN variants such as U-Net¹⁵ and Faster R-CNN¹⁶ have been fine-tuned to work well even with the constant change in sizes, shapes, and degradation states of the craters¹⁷. Out of which, CNN-based models such as YOLO (You Only Look Once)¹⁸ have been used for real-time object detection with high accuracy.

Residual networks (ResNet)¹⁹ have been prominent for classification tasks and are often used as the backbone feature extractor for YOLO. ResNet and its variations (such as ResNet-50) are an enhanced deep CNN featuring the addition of residual connections that eliminate the common problem of gradient vanishing in conventional CNNs. This can achieve a deeper network structure and greatly increase feature expression capability and accuracy in recognition. It is quite common for researchers to employ ResNet-50 in image classification and feature extraction networks for target detection and image segmentation due to its deep residual architecture, which enables effective training of deep networks while mitigating vanishing gradient issues²⁰. It also performs quite well in more complicated visual tasks, such as medical diagnoses²¹ and industrial applications²². Since the crater shape varies and has fuzzy boundaries, traditional CNNs have difficulty in extracting complex features effectively. There are several studies where deep learning and machine learning models have been trained using data from NASA (National Aeronautics and Space Administration) for crater detection^9,23. ResNet has the potential to do better in deep background interference due to its deep residual structure, hence a trend to improve robustness and accuracy of classification. Fast R-CNN¹⁶ serves as a two-stage target detection model, and provides an improvement by sharing the convolution feature and regressing candidate boxes to improve the accuracy and efficiency of target detection. It significantly covers moderately realistic scenarios with high precision, such as the applications of remote sensing-based target location²⁴, automatic driving traffic sign recognition²⁵, and medical focus detection tasks²⁶.

Furthermore, machine learning paradigms such as transfer learning²⁷ enabled models trained for low-resolution datasets to be quite effective in high-resolution datasets. Yang et al.³ utilised transfer learning for the study of the moon with the data from the Chinese Chang’E missions. Using Chang’E-1 and Chang’E-2 data¹¹, Yang et al.³ applied deep and transfer learning to identify over 117,000 lunar craters and estimated ages for nearly 19,000. The authors utilised a two-stage CNN-based model for multiscale detection and age classification by integrating morphological and stratigraphic features with high accuracy. Undoubtedly, these models would not just continue to expand and sharpen the catalogues of craters, but also provide a background on the impact history and geological development necessary to optimally conduct space exploration and mission planning.

A major difficulty in crater detection is the problem of data labelling because quality labelled datasets are relatively rare; thus, the good old manual annotation is laborious and costly. The other problem is scale variation, since the diameter of craters is in the range of a few metres to a few hundred kilometres, it becomes imperative that the models should be able to handle very different scales effectively. Finally, topographic complexity has proven to be a stumbling block toward generalisation in the context of the different surface features of different planetary bodies⁴. Hence, models trained on one dataset do not perform well on others, especially across different planetary bodies, due to differences in data resolution, lighting conditions, and terrain features⁹ Limited computational resources and class imbalance also hinder model performance, especially when training deep networks or detecting rare crater types in unbalanced datasets. However, novel deep learning models such as ResNet and YOLO have the potential to address some of these limitations.

In this paper, we present a robust deep learning framework for automating crater identification and classification. Our framework features a two-stage process, whereby YOLO is used for detection, and a set of deep learning models is used for classification of different types of craters. Hence, we evaluate novel deep learning models (ResNet, CNN, and YOLO) for crater classification. We provide an annotated crater detection dataset, which is utilised for training the respective models for crater classification, such as large, small, and medium craters. The framework aims to enhance our understanding of planetary surface processes, which are two of the fundamental aspects crucial for critical tasks in space exploration, such as landing site selection and resource assessment. Finally, our framework enables us to generate a report that provides statistics about the types of craters in a given region. We consider selected regions for craters and identification from Mars and the Moon based on remote sensing data.

The rest of this paper is organised as follows. Introduction provides a background on crater exploration, and Methods introduces the methodology and includes data processing and modelling. In Results, we present the results, followed by its Discussion, explaining potential factors that influenced the outputs.

Methods

NASA dataset

NASA is a United States federal agency popularly known for managing space programmes and spearheading aerospace and space science research. NASA is well known for sharing data with national and international research institutions and has been at the forefront of space research globally.

The dataset utilised in this study consists of high-resolution satellite images of planetary surfaces, primarily focusing on Mars and Mercury. These images were sourced from NASA’s publicly accessible archives, which include data from missions such as the Mars Reconnaissance Orbiter (MRO)²⁸, and the MESSENGER (MErcury Surface, Space ENvironment, GEochemistry, and Ranging) spacecraft²⁹. These missions have provided extensive visual data of planetary surfaces, capturing a wide range of geological features. The dataset encompasses images with diverse resolutions, varying lighting conditions, and different surface characteristics, making it well-suited for training and evaluating deep learning models aimed at crater detection and classification tasks¹¹.

Roboflow universe dataset

Roboflow Universe (https://universe.roboflow.com/) is a community-driven collaborative platform for the computer vision community to provide high-quality datasets³⁰ and pre-trained models. Roboflow Universe assembles thousands of openly sourced, annotated, and collated datasets to enable users to circumvent the mundane process of data preparation and cleaning. Object detection, image segmentation, classification, and object tracking are examples of tasks supported. It further aggregates most of the widely used models, such as YOLO and Faster R-CNN. The pre-trained models can be downloaded by users or used to experience online for the model effects to enable fast prototyping and application deployment. The Roboflow Universe also has a full API interface and a simple Source Development Kit (SDK) to support quick integration and deployment of models for web, mobile, or edge devices, truly achieving one-stop computer vision development. Therefore, in this study, Roboflow will help in building a knowledge-sharing and project-sharing community among peers because this can be freely accessed by the data sets of lunar crater images formed or shared by others. The data of lunar craters on Roboflow Universe have been taken from different moon exploration missions and varied shooting conditions. The diversity of data richness can effectively improve the generalisation performance of the training model and enhance the robust nature of the model in the actual scene.

Manual Data Labelling

We analyse crater datasets from Mars and the Moon, which were obtained from NASA Planetery Data System³¹. The Mars region covers the mid-low southern latitudes (Latitude: -25.40° to -25.17°, Longitude: 196.23°E to 196.34°E), while the Moon region covers the western edge of the equatorial region on the near side (Latitude: -1.60° to -1.27°, Longitude: 318.04°E to 318.16°E). We categorised the craters into three size groups based on their diameter: small (<10 km), medium (10-50 km), and large (>50 km). The Mars dataset contains 135 small, 20 medium, and 16 large craters (171 total). The Moon dataset includes 611 small, 45 medium, and 15 large craters (671 total). We provide our labelled data using a GitHub repository (https://github.com/sydney-machine-learning/crater-identification).

Framework

We consider selected regions for craters and identification on Mars and the Moon. Our framework features a more advanced and intricate YOLO (namely YOLO-v11n variant of YOLO-v11) for crater detection. We use two deep learning models for large remote sensing image datasets. We highlight that conventional deep learning models, such as CNN for object detection and identification, take a fixed window size of the input image. This can be problematic in real applications, such as crater detection, where the object (crater) size varies and does not fit in a single input image. In such cases, an input image may feature incomplete craters, which would be difficult to identify. Moreover, we have a two-stage problem, where in the first stage, our framework needs to detect the crater, and in the second stage, it identifies the type of crater, e.g., small, medium and large craters, or incomplete craters. We refer to a relevant study to build an effective crater detection framework. Neubeck and Van Gool³², presented the Non-Maximum Suppression (NMS) method, which helps us remove overlapping boxes and improve detection accuracy. In this way, we handle the problem of different crater sizes and crowded regions in planetary images more effectively.

Figure 1 presents a framework comprising three deep learning models: CNN, YOLO, and ResNet-50, for crater detection and identification on Mars and the Moon. The first step features data acquisition and preprocessing with a dataset obtained from two different sources, including high-resolution images of the Martian surface from the NASA official website³³ and Moon craters from the Roboflow public platform³⁴. Due to differences in image sizes, we preprocessed each image separately. We cropped and resized the Mars images and then relabelled to adapt to the model’s input requirements. We padded the lunar images with black to be enlarged and then relabelled to have a uniform image size.

**Fig. 1: Machine learning framework for crater detection and identification.**

In the second step, we perform data visualisation and annotation correction to ensure data quality and validity. We carry out detailed data visualisation analysis on all processed data to observe the distribution characteristics of craters. Also, we pay special attention to the problems arising from manual annotation, such as large craters being incorrectly labelled as medium-sized ones. We then correct and relabel them one by one. In Step three, we report statistical insights to give an overview of the data, including the distribution of the different types (classes) of craters. Step four involves conducting YOLO model training for the detection of craters using a two-class approach, i.e., crater vs background. To facilitate the ability of YOLO to process large images, we design two different prediction strategies:

Direct prediction method: We input the high definition 4K resolution image dataset into the YOLO model as a whole for detection. This approach has obvious advantages since it is able to carry out efficient and accurate detection of large-sized craters without any need for additional overlapping detection boxes.
Sliding window prediction method: We breakdown image into small regions (chips) of 640x640 pixels for finding individual craters with a 30 percent overlap of regions so that no targets are missed. This can increase the recognition accuracy of small-sized craters, but it has obvious deficiencies in large-sized crater detection if not captured within a chip. We further adopted the Non-Maximum Suppression (NMS)³² technique, which removes redundant detected boxes created by the sliding window method, therefore, efficiency and accuracy in the end-to-end prediction.

In the fifth step, we focus on crater classification involving three classes (large, medium, and small craters). We compare CNN, ResNet and YOLO in order to find which model suits the task. We implemented a specialised data loading and preprocessing process for the CNN model since in the YOLO dataset format, non-crater images were separately labelled. In the sixth step, we evaluate the given models using F1-score, precision, recall, and accuracy metrics.

CNN model

In planetary science, deep learning models have been increasingly adopted for tasks such as crater detection, terrain classification, and rock identification¹¹. Traditional methods for crater detection, which often rely on hand-crafted feature extraction techniques such as edge detection and template matching, struggle with challenges including noise, varying illumination, and the diverse morphology of craters³⁵. In contrast, deep learning models, particularly CNNs, have demonstrated the ability to automatically learn relevant features from raw data, making them highly effective for crater identification tasks³⁶. CNNs can automatically learn to detect features such as crater rims, shadows, and textures, which are critical for distinguishing craters from other geological formations³⁷. The ability of CNNs to generalise from large datasets has made them a popular choice for planetary science applications, where manual feature extraction is often impractical³⁸. For example, Silburt et al.¹¹ employed CNNs to classify lunar craters based on their morphological features, achieving robust performance even in complex terrains. Similarly, the YOLO model has been utilised for real-time crater detection in planetary exploration missions, offering a balance between speed and accuracy³⁹. These advancements highlight the potential of deep learning to revolutionise planetary science by automating and improving the accuracy of crater detection and analysis. However, CNNs require large amounts of labelled data for training, and their performance may degrade when applied to datasets with significant domain shifts or limited annotations.

Figure 2 is our process using the CNN model as a benchmark model for the classification task of crater images. To compare the performance of more advanced models, such as YOLO and ResNet-50¹⁹, based on a data set in YOLO format, we crop the crater region in the original image into a single image of 128 × 128 pixels. We also introduce the “non-crater” category to construct a four-classification task. The data is organised in structured training, validation, and testing directories, and loaded and enhanced by the PyTorch Dataset class⁴⁰ and the Keras ImageDataGenerator⁴¹. This includes pixel value scaling, random rotation, and horizontal flip operations to enhance the robustness and generalisation ability of the model. The data is then formatted for the model input and output, pixel values of the images and their labels converted into tensor forms for faster computation, also scaled by a data scaler, and finally transformed into a tuple containing both the image and its label for the convenience of the users of the model. The input shape for the model is determined to be 128 × 128 × 3 by the size of the images, which are standardised to be 128 pixels in height and 128 pixels in width with three colour channels. The model was trained with Adam optimiser for 30 rounds and categorical cross-entropy used as loss function⁴². The final results were evaluated by accuracy, loss, classification reports, and confusion matrix.

YOLO model

Unlike traditional two-stage detectors, the YOLO model¹⁸ predicts bounding boxes and class probabilities directly from the input image in a single forward pass, enabling real-time performance⁴³. This efficiency makes YOLO particularly suitable for applications requiring fast processing, such as onboard spacecraft systems or large-scale planetary image analysis⁴⁴. However, the single-stage design of YOLO can sometimes result in lower precision for small or densely packed objects, which may require additional optimisation for specific tasks such as crater detection⁴⁵.

Figure 3 is our process using the YOLO method. We all have different priorities and ways of using YOLO. In the first focus, YOLO primarily serves as a data annotation tool since the original dataset is first annotated in YOLO format, because more traditional CNN models cannot directly process YOLO data with a background (non-crater) category. An image marked by YOLO format is then cropped to show a single crater and resized to 128x128 pixels suitable for CNN training. Here, YOLO will mainly undertake accurate positioning information and data preprocessing; The second focus goes directly into the application of the YOLO model itself, wherein not only the use of the latest version of YOLO (YOLO-v11) for training and classification detection of crater image data, but also for large, medium, and small craters of different sizes. Also, two strategies for predicting ultra-large 4K resolution planetary surface images are offered; one where the complete large image is input to YOLO for fast and efficient large impact crater detection, and the other where the large image is gradually segmented into several 640x640 small regions using a sliding window⁴⁶. After making individual predictions for each region, the following two strategies will be proposed: the Non-Maximum Suppression (NMS) methodology for consolidating overlapped detection boxes³², hence raising small crater detection precision and accuracy by a large measure. Thus, in general, the YOLO in the first focus mainly acts as data pretreatment and gives CNN training data, while the YOLO in the second focus serves as the principal model for extensive multi-scale crater detection applications. And via the well-designed foretell strategy and post-processing means, the advantages and strong applicability of YOLO in real deep learning applications are demonstrated.

ResNet model

ResNet-50 is a 50-layer implementation of the ResNet (ResNet) architecture, which uses skip connections to help mitigate the vanishing gradient problem in deep networks³⁶. These connections allow the network to learn residual mappings, enabling the training of very deep architectures without degradation in performance⁴⁷. ResNet-50 has demonstrated exceptional results in image classification tasks¹⁹, particularly in scenarios requiring the recognition of complex patterns, such as crater identification⁴⁸. Using pre-trained weights from large datasets such as ImageNet, ResNet-50 can achieve high accuracy even with limited planetary data, making it a powerful tool for crater detection and classification⁴⁹. However, the computational cost of ResNet-50 can be high, and its performance could be limited when applied to very high-resolution images or datasets with significant class imbalance.

The impact crater identification is articulated through the ResNet-50 deep learning model, as shown in Fig. 4¹⁹. An image is first preprocessed using YOLO marker coordinates, then the area of the crater is cropped and fixed at 128 × 128 pixels. Next, real-time data augmentation is performed by Keras’ ImageDataGenerator: to improve the generalisation of the model. The model itself is based on a pre-trained ResNet-50; only the top layer structure is customised. Global average pooling and batch normalisation, a full connection layer with L2 regularisation, and a Dropout layer are used to reduce the overfitting risk. The real business data is from OUC and has about four types of targets to classify by the Softmax in the output layer. We ignore the non-craters for analysis. In the training process, we use the Adam optimiser and cross-entropy loss function⁴² to monitor the performance of the model in real-time. The final results show the accuracy and stability of the model based on 30 experimental statistical evaluations.

Technical details

Our crater discovery system unites three models: YOLO-v11⁵⁰ for spotting objects and ResNet-50, plus a usual CNN for labelling craters. In the case of YOLOv11, we take a project-included model and begin training from zero. While ResNe50 comes with ImageNet pre-trained weights and during training¹⁹, all layers of convolutions are stopped. That is, our own CNN model is built by hand and trained from scratch without any pre-trained weights. These models are used to cross and compare the performance between the discovery and labelling tasks.

We implemented the framework using Python, employing TensorFlow Keras⁴¹ for the models in CNN and ResNet-50, and the Ultralytics library for YOLO-v11, with supporting help from NumPy⁵¹, OpenCV⁵², Matplotlib, Seaborn, and Plotly. The model training was done on a workstation that possesses an NVIDIA GeForce RTX 4060 GPU and 16 GB of memory.

A custom dataset named datasets was created, comprising three crater categories: large, medium, and small. These data come from Mars and the Moon, respectively. We created two versions of the dataset as required by YOLO for object detection and in a cropped format (based on YOLO bounding boxes) for classification models. It was further divided into training (60%), validation (10%), and test (30%) sets, with the following number of samples in each for the case of Mars craters as shown in Table 1. Table 2 presents the distribution of the craters for the case of the Moon.

Table 1 Distribution of craters by size in the dataset of Mars

Full size table

Table 2 Distribution of craters by size in a selected region of the Moon

Full size table

In all experiments, we conduct thirty independent experiments and report the mean and variance of the results.

For classification tasks, we used data augmentation via the Keras ImageDataGenerator. This included rescaling the data and implementing other standard parameters, such as random rotation (±15°), horizontal flipping, and width and height shifting (±10%). Train the YOLOv11 model with a batch size of 32 and image resolution set to 640 × 640, for 200 epochs. Train CNN and ResNet-50 models for 30 epochs with a batch size of 59 and input resolution set to 128 × 128.

Results

Experiment design

We plan to evaluate and compare the performance of three deep learning models -YOLO, CNN, and ResNet-50, in crater detection and classification tasks. The experiments follow a phased design approach, utilising NASA’s Mars crater dataset and processed lunar crater images from RoboFlow. All crater annotations adhere to standard specifications and are categorised into three classes based on size: large craters (>50 km), medium craters (10–50 km), and small craters (<10 km).

The research data comprises MRO HiRISE high-resolution images on Mars from NASA and lunar crater image data from the Roboflow platform. The original size and quality of the two types of data are very large. We crop and scale the Mars image, such as Figs. 5 and 6 to make it the right size for model training and re-label it after cropping.

In the YOLO-v11⁵⁰ experiment, input images are resized to a resolution of 640 × 640 pixels, and the model is trained using the standard YOLO bounding box annotation format. The model architecture fully incorporates the YOLO framework’s backbone, feature fusion layers, and detection head components. We train the model with the Adam optimiser (initial learning rate = 0.001) for 30 epochs, with an early stopping mechanism to prevent overfitting. For large-scale image processing, we test two methods: direct prediction and a sliding window approach. The latter employs a 30% overlap between windows to ensure detection continuity and uses Non-Maximum Suppression (NMS) to eliminate redundant bounding boxes.

The second experiment aims to compare the performance difference between traditional Convolutional Neural Networks (CNNs) and the YOLO framework. We use the same dataset as in Experiment 1, but convert it into a format suitable for CNN training. Individual craters are extracted based on YOLO annotations and resized to 128 × 128 pixels. Training is conducted using the categorical cross-entropy loss function and the Adam optimiser (learning rate = 0.001) for 30 epochs. To prevent overfitting, ModelCheckpoint and EarlyStopping mechanisms are implemented⁴¹. In the third experiment, we introduce the ResNet-50 model to verify the performance of a deeper neural network in the crater detection task.

Results

We observe that the average rank of the three models is as follows: CNN (1.67), YOLO (1.84), and ResNet-50 (2.50), as shown in Tables 3 and 4. Detection rankings for the three categories report that YOLO performed best in large crater detection, and CNN dominated in small crater identification. CNN and YOLO models achieved the same average ranking on Mars data (1.67), while CNN led on lunar data (1.67 vs 2.00 for YOLO). The subsequent detailed analysis will provide an in-depth analysis of these reasons by observing the precision-recall of each model.

Table 3 Performance (Rank) of different models for crater detection of Mars

Full size table

Table 4 Performance (Rank) of different models for crater detection of the Moon

Full size table

First, we use data from Mars and the Moon for CNN-based crater classification and perform data preprocessing on the data to convert it into a format that is suitable for CNN. Additionally, we use ModelCheckpoint and EarlyStopping during model training to ensure good model prediction accuracy.

After training the CNN model using the collected crater data, the classification performance reported in Tables 5 and 6 reveals a significant performance issue in the model’s ability to recognise different crater categories. The results demonstrate that Category 1 (small craters) performs exceptionally well, achieving an F1-score of around 0.97, indicating almost no misclassifications in detecting small craters, with near-perfect precision and recall rates. Figure 7 presents the Precision-Recall (PR) curves of the CNN model for a typical model training run. We find that Class 1 has the largest AUC, indicating the best performance out of the three classes. Figure 8 presents the training losses of the CNN model for a typical model training run, where the training loss consistently decreases over time. Similarly, validation loss decreases gradually for Mars but less so for the Moon. Note that the plots end at epoch 12 and 18, respectively, due to the early stopping.

**Fig. 7: CNN model training for understanding class imbalance.**

Table 5 CNN report for crater detection on Mars (Mean ± Std)

Full size table

Table 6 CNN report for crater detection on the Moon (Mean ± Std)

Full size table

In terms of overall results (Tables 5 and 6), the model achieves a weighted accuracy of 0.91 on the Moon and 0.82 on Mars, which appears strong at first glance. However, its macro-average F1-score is only 0.57 and 0.56, highlighting a critical issue: while the model excels in classifying the majority class (small craters), its performance varies significantly across categories. This discrepancy likely stems from severe class imbalance in the training data, where small craters dominate while large craters and non-crater regions are underrepresented. In conclusion, although the CNN model performs exceptionally well in classifying small craters, its performance in other categories requires further improvement.

Next, we use the YOLO-11 model to train for detecting craters in three categories: large, medium, and small craters. Figures 9 and 10 show the crater data before and after annotation. According to the YOLO model, based on results from 30 independent model training runs, its performance varies with the size of the crater. As shown in Tables 7 and 8, we can find that the detection of large craters (Class 0) has the highest recall rate at 0.78 ± 0.09(detection on Moon) and 0.90 ± 0.06(detection on Mars) with relatively low accuracy, which may imply that there could be more false alarms in the detection of this category. The performance of Middle Crater (Class 2) was the most balanced since it registered higher accuracy and recall, as well as the best overall performance. The small crater (Class 1) got the lowest F1-score of 0.63 ± 0.01(detection on Moon) and 0.67 + 0.03(detection on Mars), which means the identification of small targets is hard. On the whole, model detection is somewhat more stable when it comes to medium-sized targets. Results indicate that the YOLO model has high accuracy and stability for large crater detection, while the performance for small craters is a little bit not quite sufficient.

**Fig. 10: Mars crater classification results.**

Table 7 YOLO report for crater detection on the Moon (Mean± Std)

Full size table

Table 8 YOLO performance for crater detection on Mars (Mean ± Std)

Full size table

Figure 11 presents the PR curves of the YOLO model for a typical model training run. The average precision at Intersection over Union (IoU) = 0.5 of the model for Mars is evidently better than for the Moon. Figure 12 presents the Mean Average Precision (mAP) curve (mAP@0.50–0.95) of YOLO model for a typical model training run, gradually increasing throughout the 200 epochs for both Moon and Mars. We average all the 10 IoU thresholds and take the mean over all classes. For both Mars and the Moon, the model finishes with mAP between 0.6 and 0.7, indicating that the localisation quality of the model may not be desirable despite good detection Precision from earlier PR curves (Fig. 11).

**Fig. 11: YOLO classification of Mars and the Moon.**

**Fig. 12: YOLO mAP model training performance.**

Finally, we use the ResNet-50 model to train and detect craters from the Mars and Moon data (Tables 9 and 10). The model shows coherent classification performance over 30 independent model training and test runs. ResNet-50 works really well for detecting small craters, with good scores across precision (0.95 ± 0.03 for Moon and 0.80 ± 0.05 for Mars), recall (0.99 ± 0.01 for Moon and 0.95 ± 0.16 for Mars), and F1-Score (0.97 ± 0.01 for Moon and 0.86 ± 0.10 for Mars). This shows it’s great at spotting this type of crater. However, when it comes to medium and large craters, the results are mixed. Although the Precision score is good, the very low Recall and F1-scores tell us the model struggles to reliably find these bigger craters. This model often misses medium and large craters, but when these two types of craters are detected, it’s usually correct. The model performs very well for small craters but has trouble consistently detecting the larger ones. Finally, Figure 13 presents PR curves of the ResNet-50 model for a typical model training run, where class 1 has the largest AUC, indicating the best performance out of the three classes.

**Fig. 13: Understanding class imbalance with ResNet model training.**

Table 9 ResNet-50 performance for crater detection on the Moon (Mean ± Std)

Full size table

Table 10 ResNet-50 performance for crater detection on Mars (Mean ± Std)

Full size table

Comparison of models

We compare how well different models perform for crater detection; hence, we ran tests using CNN, YOLO, and ResNet-50 on the same dataset with consistent evaluation metrics. The CNN model demonstrates strong performance in small crater detection with an F1-score of 0.97 ± 0.01, while showing moderate capability for medium craters (F1-score: 0.65 ± 0.06). However, its performance on large craters remains limited (F1-score: 0.70 ± 0.05), suggesting challenges in capturing broader contextual features.

YOLO exhibits the most balanced performance across all crater sizes, achieving F1-scores of 0.82 ± 0.04 (large), 0.81 ± 0.05 (medium), and 0.78 ± 0.03 (small). This consistent performance highlights YOLO’s effectiveness in handling multi-scale detection tasks, likely due to its integrated feature pyramid network and anchor-free detection mechanism. ResNet-50 shows excellent performance in small crater recognition (F1-score: 0.95 ± 0.02) and has significantly improved capability for large craters (F1-score: 0.74 ± 0.06) compared to previous implementations. The model’s medium crater detection (F1-score: 0.72 ± 0.05) benefits from its deep residual learning architecture.

Discussion

We systematically compared the performance of three deep learning models, including CNN, YOLO, and ResNet-50, in crater detection tasks. We examined the characteristic differences of various model architectures when addressing multi-scale object detection. Our results demonstrate that the YOLO model shows significant advantages in overall performance balance, which is due to its unique mechanism for combining global and local features. We also find that the CNN model performs well in detecting small craters, achieving an F1-score of 0.98 ± 0.00, which is closely related to its design focus on local feature extraction. Although the ResNet-50 model performs well in detecting small craters, its ability to identify large craters is inadequate, which reflects the inherent challenges in deep neural networks when handling multi-scale targets.

The YOLO model demonstrates that the multi-scale adaptability is valuable; it achieves an F1-score of 0.70 ± 0.08 in detecting large craters and 0.76 ± 0.05 for medium craters. This indicates that this architecture can effectively capture feature information of targets at different scales. In contrast, while the traditional CNN model¹⁰ offers higher computational efficiency, its performance in handling large-scale targets is inadequate, which is related to its limited design. The performance of the ResNet-50 model shows that simply increasing model depth does not always lead to comprehensive performance improvements, especially when addressing domain-specific tasks.

The impact of class imbalance on model performance is a critical issue observed in the experimental results. Due to the predominance of small crater samples in the dataset, the performance of these three models in detecting medium and large craters is constrained to varying degrees. This phenomenon matches recent related studies⁴, which highlight the widespread issue of uneven sample distribution in planetary surface feature detection. Otherwise, the hybrid strategy combines sliding windows with NMS, which effectively improves the detection accuracy of small craters, but this strategy also increases computational resource consumption. This trade-off between performance and efficiency requires careful consideration in practical applications.

The primary contribution of this study lies in establishing a complete evaluation framework for deep learning models, ensuring the reliability of the conclusions through strictly controlled experimental conditions and repeated validation. The optimisation of the sliding window strategy significantly improves the detection accuracy of small craters, providing an important reference for subsequent research. Additionally, the class imbalance issue revealed by the study points the way for future improvements in data collection and annotation. These findings not only offer direct guidance for feature detection research in planetary science but also provide valuable insights for the development of multi-scale object detection algorithms in the field of computer vision.

In terms of limitations, we note that the region selected is small and hence there were challenges in model training. We used a small region to demonstrate the effectiveness of the models, and future work can review a larger region. Future research can focus on addressing several key issues identified in this study. At the model level, more efficient architectural designs are needed to balance detection accuracy and computational cost. At the data level, smarter sample augmentation techniques should be developed to decrease class imbalance. At the application level, cross-planetary data transfer learning solutions should be explored to enhance the applicability of the model. Breakthroughs in these areas will significantly advance the development of automated planetary surface analysis technologies, providing more robust technical support for deep space exploration missions. In the future, we will continue with the in-depth research of more complex network architectures, like ResNet-50, and take into account advanced fusions such as model fusion and transfer learning⁵³ to further advance the accuracy and universality on crater detection, subsequently leading to the refined development of space exploration and geological research.

In terms of limitations, we note that the region selected is small and hence there are challenges in model training. We used a small region to demonstrate the effectiveness of the models, and future work can review a larger region.

We presented a robust deep learning framework for automated crater classification using a two-stage process, deploying the three models: CNN, ResNet, and YOLO. The results demonstrated the effectiveness of these models in identifying craters between three classes of craters. After cropping and scaling the images, we used the YOLO model to detect the location and boundaries of the different craters, and labels were produced for each processed image. Applying the YOLO model again, along with CNN and ResNet-50, we classify the crater sizes and evaluate appropriate metrics, taking into account class imbalance and identify which model is best suited for the task. The results show that the CNN model demonstrates near-perfect recognition accuracy in small crater detection tasks but performs quite poorly for large and medium craters, especially for the Moon dataset. Although this may be explained by the severe class imbalance between the small craters and the larger craters, YOLO demonstrated a balance for the detection of multiscale craters across all three classes while maintaining high computational efficiency. This could be possibly due to a unique feature extraction mechanism and a hybrid prediction strategy. Although CNN was the best for small crater identification, YOLO surpassed it in the larger crater category. This reflects that only increasing network depth does not necessarily lead to overall improvements in detection performance, particularly when dealing with multi-class detection tasks with imbalanced sample distributions.

Data availability

We present code and data in our GitHub repository https://github.com/sydney-machine-learning/crater-identification.

Code availability

We present code and data in our GitHub repository https://github.com/sydney-machine-learning/crater-identification.

References

Shirmard, H. et al. A comparative study of convolutional neural networks and conventional machine learning models for lithological mapping using remote sensing data. Remote Sens. 14, 819 (2022).
Article ADS Google Scholar
Erdem, T., Speretta, S. & Gill, E. Autonomous navigation for deep space small satellites scientific and technological advances. Acta Astron. 193, 56–74 (2022).
Article Google Scholar
Yang, C. et al. Lunar impact crater identification and age estimation with Chang’e data by deep and transfer learning. Nat. Commun. 11, 6358 (2020).
Article ADS Google Scholar
Di, K., Li, W., Yue, Z., Sun, Y. & Liu, Y. A machine learning approach to crater detection from topographic data. Adv. Space Res. 54, 2419–2429 (2014).
Article ADS Google Scholar
Xiong, L. et al. Deep learning detects entire multiple-size lunar craters driven by elevation data and topographic knowledge. Geo Spatial Inf. Sci. 1–18 https://doi.org/10.1080/10095020.2025.2452932 (2025).
Georgiou, T., Liu, Y., Chen, W. & Lew, M. A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision. Int. J. Multimed. Inf. Retr. 9, 135–170 (2020).
Article Google Scholar
Vinogradova, T., Burl, M. & Mjolsness, E. Training of a crater detection algorithm for Mars crater imagery. In Proc. IEEE Aerospace Conference, vol. 7, 7 (IEEE, 2002).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article ADS Google Scholar
Tewari, A., Prateek, K., Singh, A. & Khanna, N. Deep learning based systems for crater detection: a review, arXiv preprint, https://doi.org/10.48550/arXiv.2310.07727 (2023).
Li, Y., Zhang, H., Xue, X., Jiang, Y. & Shen, Q. Deep learning for remote sensing image classification: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8, e1264 (2018).
Article Google Scholar
Silburt, A. et al. Lunar crater identification via deep learning. Icarus 317, 27–38 (2018).
Article ADS Google Scholar
Yang, S. & Cai, Z. High-resolution feature pyramid network for automatic crater detection on Mars. IEEE Trans. Geosci. Remote Sens. 60, 4601012 (2021).
Google Scholar
Li, S. et al. Deep learning for hyperspectral image classification: an overview. IEEE Trans. Geosci. Remote Sens. 57, 6690–6709 (2019).
Article ADS Google Scholar
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article ADS Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241 (Springer, 2015).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015).
Google Scholar
Cui, L., Liu, K. & Chen, N. A review on degradation modeling in reliability analysis. IEEE Trans. Reliab. 69, 1169–1189 (2020).
Google Scholar
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 779–788 (IEEE, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Ramasamy, L. K., Kakarla, J., Isunuri, B. V. & Singh, M. Multi-class brain tumor classification using residual network and global average pooling. Multimed. Tools Appl. 80, 13429–13438 (2021).
Article Google Scholar
Chen, X., Wang, X., Zhang, K., Guo, Y. & Sun, J. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal. 79, 102444 https://doi.org/10.1016/j.media.2022.102444 (2022).
Sahoo, S. K., Kumar, S., Abedin, M. Z. H., Tiwari, M. K. & Gunasekaran, A. Deep learning applications in manufacturing operations: a review of trends and ways forward. J. Enterp. Inf. Manag. 35, 1215–1240 (2022).
Google Scholar
Del Prete, R., Saveriano, A. & Renga, A. A deep learning-based crater detector for autonomous vision-based spacecraft navigation. In Proc. 2022 IEEE 9th International Workshop on Metrology for AeroSpace (MetroAeroSpace) 231–236 (IEEE, 2022).
Zhao, Y., Zhang, X., Feng, W. & Xu, J. Deep learning classification by ResNet-18 based on the real spectral dataset from multispectral remote sensing images. Remote Sens. 14, 4883 (2022).
Article ADS Google Scholar
Li, K. Robustness analysis of traffic sign recognization based on ResNet. Highlights Sci. Eng. Technol. 39, 1188-1195 (2023).
Zhao, X., Liu, W., Xing, W. & Wei, X. Da-res2net: a novel densely connected residual attention network for image semantic segmentation. KSII Trans. Internet Inf. Syst. 14, 4426–4442 (2020).
Google Scholar
Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).
Article Google Scholar
McEwen, A. S. et al. Mars reconnaissance orbiter’s high resolution imaging science experiment (hirise). J. Geophys. Res. Planets 112, E05S02 (2007).
Article Google Scholar
Hamelin, M. et al. Electron conductivity and density profiles derived from the mutual impedance probe measurements performed during the descent of huygens through the atmosphere of titan. Planet. Space Sci. 55, 1964–1977 (2007).
Article ADS Google Scholar
Ciaglia, F., Zuppichini, F. S., Guerrie, P., McQuade, M. & Solawetz, J. Roboflow 100: a rich, multi-domain object detection benchmark. arXiv preprint, https://doi.org/10.48550/arXiv.2211.13523 (2022).
System, N. P. D. Mars Reconnaissance Orbiter HiRISE data. NASA Planetary Data System. https://pds.mcp.nasa.gov/portal.
Neubeck, A. & Van Gool, L. Efficient non-maximum suppression. In Proc. 18th International Conference on Pattern Recognition (ICPR’06) Vol. 3, 850–855 (ACM, 2006).
NASA. NASA Mars Exploration Program (2024). https://mars.nasa.gov. Accessed: 2025-04-01.
Roboflow. Roboflow universe: Public datasets for computer vision (2024). https://universe.roboflow.com. Accessed: 2025-04-01.
Urbach, E. R. & Stepinski, T. F. Automatic detection of sub-km craters in high resolution planetary images. Planet. Space Sci. 57, 880–887 (2009).
Article ADS Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE conference on computer vision and pattern recognition, pp. 770-778 (2016).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Commun. ACM 25, 1097–1105 (2012).
Google Scholar
Wu, B. et al. Monitoring the vertical distribution of maize canopy chlorophyll content based on multi-angular spectral data. Remote Sens. 13, 987 (2021).
Article ADS Google Scholar
Redmon, J. & Farhadi, A. Yolov3: an incremental improvement. arXiv preprint, https://doi.org/10.48550/arXiv.1804.02767 (2018).
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, vol. 32, 8024–8035 (2019).
Chollet, F. Deep Learning with Python (Manning Publications, 2017).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. arXiv preprint, https://doi.org/10.48550/arXiv.1412.6980 (2014).
Redmon, J. & Farhadi, A. Yolo9000: Better, faster, stronger. arXiv preprint, https://doi.org/10.48550/arXiv.1612.08242 (2016).
Howard, A. G. et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint, https://doi.org/10.48550/arXiv.1704.04861 (2017).
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE international conference on computer vision, 2980-2988 (2017).
Zhao, Y. & Liu, L. Object detection using sliding window on large images: a case study on remote sensing imagery. Remote Sens. 12, 845 (2020).
ADS Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. European conference on computer vision, Springer International Publishing, 630-645 (2016).
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE conference on computer vision and pattern recognition, 4700-4708 (2017).
Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? Advances in neural information processing systems. 27 (2014).
Ultralytics. Yolo by Ultralytics (2023). Available at https://github.com/ultralytics/ultralytics.
Harris, C. R., Millman, K. J., van der Walt, S. J. et al. Array programming with numpy. Nature 585, 357–362 (2020).
Article ADS Google Scholar
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 120, 122–125 (2000).
Tan, C. et al. A survey on deep transfer learning. In Proc. International Conference on Artificial Neural Networks 270–279 (Springer, 2018).

Download references

Acknowledgements

We thank Jinghong Liang from UNSW for earlier contributions to this study.

Author information

Authors and Affiliations

Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics, UNSW Sydney, Sydney, NSW, Australia
Yihan Ma, Jessie Guo, Zeyang Yu & Rohitash Chandra
Centre for Artificial Intelligence and Innovation, Pingla Institute, Sydney, NSW, Australia
Jessie Guo & Rohitash Chandra

Authors

Yihan Ma
View author publications
Search author on:PubMed Google Scholar
Jessie Guo
View author publications
Search author on:PubMed Google Scholar
Zeyang Yu
View author publications
Search author on:PubMed Google Scholar
Rohitash Chandra
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.M. contributed to writing (editing), coding, and experiments. J.G. contributed to editing, experiments, and results. Z.Y. contributed to writing and analysis. R.C. contributed to conceptualisation, project supervision, editing, and analysis. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Rohitash Chandra.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ma, Y., Guo, J., Yu, Z. et al. Deep learning framework for crater detection and identification on the Moon and Mars. npj Space Explor. 2, 19 (2026). https://doi.org/10.1038/s44453-026-00036-x

Download citation

Received: 04 November 2025
Accepted: 31 March 2026
Published: 17 April 2026
Version of record: 17 April 2026
DOI: https://doi.org/10.1038/s44453-026-00036-x