Abstract
Understanding and preserving the deep sea ecosystems is paramount for marine conservation efforts. Automated object (deep-sea biota) classification can enable the creation of detailed habitat maps that not only aid in biodiversity assessments but also provide essential data to evaluate ecosystem health and resilience. Having a significant source of labelled data helps prevent overfitting and enables training deep learning models with numerous parameters. In this paper, we contribute to the establishment of a significant deep-sea remotely operated vehicle (ROV) image classification dataset with 3994 images featuring deep-sea biota belonging to 33 classes. We manually label the images through rigorous quality control with human-in-the-loop image labelling. Leveraging data from ROV equipped with advanced imaging systems, our study provides results using novel deep-learning models for image classification. We use deep learning models including ResNet, DenseNet, Inception, and Inception-ResNet to benchmark the dataset that features class imbalance with many classes. Our results show that the Inception-ResNet model provides a mean classification accuracy of 65%, with AUC scores exceeding 0.8 for each class.
Similar content being viewed by others
Background & Summary
The Great Barrier Reef (GBR)1 is an invaluable ecological treasure, housing intricate and delicate deep-sea ecosystems that play a pivotal role in maintaining the region’s marine biodiversity2. As these ecosystems face increasing threats from climate change, habitat destruction, and anthropogenic activities3; understanding and preserving them have become paramount goals for marine conservation efforts4. Advanced tools for comprehending the complexities of these underwater environments, such as automated object (biota) detection5; in particular, have the potential to provide a better understanding of the deep sea ecosystem expeditiously and more efficiently.
Analyzing images from remotely operated vehicles (ROVs) is a complex process that requires human expertise in marine biology, geology, and environmental dynamics. Skilled analysts regularly identify and categorise marine organisms, recognise geological features, and detect anomalies; however, the sheer volume of data accumulated from ROVs makes expert analysis infeasible. Software tools such as Coral Point Count with Excel extensions (CPCe)6, photoQuad7, Biigle8, and Squiddle+9 have been used by experts to assist in the manual annotation of imagery with minimal to no automation in the annotation process. Some examples of manual and skilled data annotations from the literature are as follows. Schoening et al.10 annotated 1340 images taken through ROVs and towed cameras with the help of five expert annotators using Biigle. The authors estimated the abundances of 20 fauna categories around the DISCOL (disturbance and recolonisation experiment) in a manganese nodule area of the deep South Pacific11. Wright et al.12 manually annotated over 5,665 images of the One Tree Island’s shelf edge in the GBR to analyze changes in benthic community composition. Pawlik et al.13 visually analysed quadrat point-intercept data to calculate sponge cover in the Caribbean mesophotic reefs. Bridge et al.14 analysed 718 images taken by an autonomous unmanned vehicle (AUV) across transects around the Hydrographers Passage in the GBR and manually analysed 27 categories of macrofauna. Sih et al.15 analysed multibeam sonar data and baited ROV video footage across 42 sites in the central GBR and examined relationships between deep-reef fish communities and benthic habitat structure. They classified sponges, corals, macro-algae, reef fish, and other substrates. These methods are time-consuming and prone to bias as only smaller manageable representative samples of data are analysed due to time limitations16. Combining manual analysis with automation can improve the efficiency of data analysis, allowing the use of larger data samples; hence, improving our understanding of the deep sea17.
In the literature, several machine learning models have been developed for the automatic classification of deep-sea flora and fauna. Lopez-Vazquez et al.18 operated a crawler ”Wally” (an Internet Operated Vehicle) at Barkley Canyon (Vancouver, Canada) in 870 m depth since 2009. In this study, the group collected over six thousand image samples that were classified into 10 classes by using the NEPTUNE Canada Marine Life Field Guide19. They evaluated the raw images in metrics and designed an underwater image enhancement pipeline to avoid problems such as image degradation. Furthermore, they reported that eight classical machine learning models improved the classification pipeline using independent datasets with an accuracy of 88.64%. Maka Niu is a low-cost imaging and analysis tool designed collaboratively for deep-sea exploration in Hawaii20. The imagery and sensor data were uploaded to Tator21 for analysis, and the models gave good performance for object detection. Tator is an online video annotation platform based on RetinaNet and Detectron222. Williams et al.16 developed CoralNet, a machine learning-based image analysis tool23 for the automation of coral reef benthic cover estimation using image data from NOAA (National Oceanic and Atmospheric Administration)24. The results showed that CoralNet strongly correlates with human analysts in the main Hawaiian Islands and American Samoa. This indicates that CoralNet can improve the efficiency and consistency of coral reef monitoring programs. All these methods are based on point annotations and have good performance for predicting cover for benthic communities; however, they have poor performance or cannot completely predict the presence and absence of benthic communities.
Deep learning is a machine learning methodology that has been prominently used in computer vision problems such as image (object) classification25, and also been used for classifying underwater marine environments26,27,28. Hence, object detection holds immense potential for identifying and quantifying the diverse benthic organisms29 and substrate types that characterise the deep sea ecosystems of the GBR. Object detection not only facilitates the creation of detailed habitat maps but also aids in conducting thorough biodiversity assessments30,31. Furthermore, it offers essential data that enable scientists to evaluate the health and resilience of these ecosystems, providing a foundation for evidence-based conservation strategies. However, the success of object detection tools in this context hinges on the availability of accurate and comprehensive labelled datasets. Zurowietz et al.32 developed a machine learning assisted image annotation method (MAIAA) for environmental monitoring and exploration using autoencoder networks and mask region-based convolutional neural networks. The model was able to greatly improve annotation speeds; however, it is limited to binary classification, looking at class shell and animal only. Dawkins et al.33 created an open-source computer vision software platform called Video and Image Analytics for Marine Environments (VIAME) for automating the image analysis process, for applications in marine imagery or any other type of video analytics. The platform successfully classified 12 types of sea creatures with good accuracy. The unique challenges underwater imaging poses, such as variable lighting conditions, species diversity and life characteristics, and imaging resolution—require machine learning models to be robust and capable of generalising well beyond their training data31,34. Consequently, a substantial, quality-controlled and diverse dataset becomes an indispensable asset for developing models that can effectively identify and classify various objects amidst the complexities of real-world underwater scenarios.
In this paper, we introduce the Deepdive dataset which is a deep sea ROV object detection dataset that we meticulously curate and label through a human-in-the-loop35 approach to ensure high accuracy. The Deepdive dataset comprises a vast collection of images spanning many classes of benthic biota. In the data curation process, we manually label 4158 images of 62 classes by leveraging data acquired by ROVs equipped with high-resolution imaging systems and obtain a dataset that features class imbalance. Hence, this study showcases the practical applications of image classification techniques in marine conservation and habitat mapping. We provide a comparison of novel pre-trained deep learning models for classification including ResNet, DenseNet, Inception, and Inception-ResNet for benchmarking the dataset.
Methods
Problem Background
Underwater imaging presents a unique set of challenges from environmental and technical factors. There are various aspects that can significantly impact the quality of underwater images20, such as species characteristics, environmental conditions, substrate properties, and equipment (machine) limitations. Environmental factors such as water turbidity, lighting conditions, and currents can decrease image clarity. Certain types of substrates, such as sand and coral, can affect image contrast and sharpness. Hardware and software limitations such as camera resolution and autofocusing capabilities are crucial to image quality. For instance, low camera resolution leads to poor image quality, while lack of advanced autofocusing for mobile objects typically results in blurring. Additionally, the amount of natural light penetration, often limited in deeper waters, affects the overall image brightness. Species characteristics, such as the behaviour and mobility of underwater organisms, impose challenges on the acquisition of quality image-based data. Noise, caused by factors such as species sensitivity to sudden light changes, can further degrade image quality. Therefore, underwater imaging requires careful consideration of these challenges to capture accurate and informative visuals of aquatic environments and their inhabitants. Figure 1 shows a subset of these images that are free of blemishes, and Fig. 2 presents cases of data samples that are of lower quality due to various technical and biological factors. All of these images are extracted from videos from the ROV SuBastian Dive in the submarine canyons of the outer GBR explored the Ribbon Reef Canyons in 202036,37. The ROV dive video and photo data are available publically on National Computational Infrastructure (NCI) Australia37. Ref. 36 describes the full deployment details for the research vessel R/V Falkor cruises on which this data was collected. Figure 6 shows the location of the dives and the types of transects the ROV took through six of the canyons of the reef. Table 2 shows the six dive video details imaged at varying depths using a SULIS Subsea Z71, 4K (12X zoom) science camera mounted on ROV SuBastian. We extracted frame grabs from all these videos at 10-second intervals and got 1546 images.
Human-in-the loop
Human-in-the-loop (HITL) machine learning is a collaborative approach that leverages the complementary strengths of humans and machines in problem-solving35. This approach is widely employed across diverse applications, including image classification38,39, natural language processing40, and medical diagnosis41. Human involvement starts with collecting, labelling, cleaning, and organising data for machine learning. Humans (experts) are used to provide feedback and expertise for iterative model training and evaluation. HITL approaches have several advantages over traditional machine learning methods, including higher accuracy, improved transparency, and reduced bias. However, they also have some drawbacks, such as increased development and maintenance costs, complexity in designing human-machine interfaces, and the need for human feedback which can limit scalability35. Continuous improvement of a model’s performance largely depends on the feedback loop. HITL models have been used in the context of segmentation of Earth’s imagery42 as well as artificial intelligence-assisted annotation of coral reef orthoimages43.
Manual data labelling
In this study, we classify all objects into three broad categories; biota, general unknown biology, and non-living objects by following the categories described in Version 1.4 of the Collaborative and Automated Tools for Analysis of Marine Imagery (CATAMI) classification scheme44. The CATAMI classification system is a hierarchical taxonomy-based approach developed to classify marine habitats and biota observed in underwater imagery and videos. The system provides a framework for analysing individual points within images or videos, accommodating the limitations of identifying biota solely from imagery, such as the need for broad morphological categories for certain taxa like sponges. We classify the data in this study into 62 categories coming from 4158 images, including different biota, general unknown biology, and non-living objects. 59 out of the total number of classes identified were biota. We created the general unknown biology class to include any biota we could not categorise. The remaining 2 classes were non-living and included coral rubble and wooden debris. Table 1 shows the number of instances of each class in the dataset. We used Squidle+45, a web-based image annotation tool, to upload and annotate the data and then downloaded each of the images and cropped the identified objects into individual image instances. This was necessary as some frame grabs contained multiple classes of objects and our dataset contains individual instances of each class per image. There are exceptions to this when we looked at class tuberworms as it was not possible to separate individual objects due to their small size.
We utilise two research assistants (experts) to check each image label and reduce the uncertainty and error in the labelling process. The first expert marks and labels each object on the image on Squiddle+ and the second expert then reviews the label in the next stage, while cropping out individual instances of each object from the frame grab. Additionally, uncertainty in the identification of species is allowed by putting labels at a coarser level within the hierarchy. Therefore, we place the species with low identification and clarification at a higher level within the hierarchy. We ignore the images with blur phase and strong external influences (e.g. sand obstruction when ROV moving or sampling) in the dataset. Using the HITL approach described above, we curated 62 classes of objects; however, to remove extreme imbalance from the dataset, we only kept 33 classes. Figure 1 presents a subsample of the images from the dataset labelled using the CATAMI classification.
Deep learning models
We select multiple image classifiers based on convolutional neural network (CNN)46 architectures that have been pre-trained using large datasets. These models leverage the power of transfer learning, which has become an integral part of computer vision tasks47,48,49 such as image classification47,50, object detection51,52, and image segmentation53, achieving remarkable performance.
The ResNet (Residual Network) model revolutionised deep learning by introducing skip connections that enable the training of extremely deep neural networks54. ResNet’s residual connections ensure valuable information is efficiently propagated through the layers. Traditional deep learning models faced a limitation where adding more layers might lead to degradation in performance due to the vanishing gradient problem55. ResNet’s skip connections allow gradients to flow directly through the network, facilitating the training of networks with hundreds or even thousands of layers. By alleviating the degradation issue, ResNet enables the construction of deeper networks that can capture increasingly complex features, leading to state-of-the-art performance on various computer vision tasks such as image classification56, object detection57, and semantic segmentation53.
DenseNet58 (Densely Connected Convolutional Networks) presents a pioneering architecture that fosters maximum information flow between layers by establishing dense connections. DenseNet’s densely connected blocks reuse and gradient flow, leading to better training efficiency and overall performance59. Unlike traditional CNNs, where each layer is connected only to its subsequent layer, DenseNet connects each layer to every other layer in a feedforward fashion within a dense block. This dense connectivity pattern encourages feature reuse and facilitates gradient flow throughout the network, mitigating vanishing gradient issues and promoting feature propagation58. Furthermore, DenseNet’s compact representation and parameter efficiency make it particularly well-suited for tasks with limited training data, as it enables the model to leverage information from earlier layers more effectively60. The intricate interconnections within DenseNet leads to impressive performance gains across various domains, including image classification61 and object detection62.
The Inception Network is a deep CNN that is recognised for its novel inception module that efficiently captures features in data across multiple spatial scales63. The inception module employs parallel convolutional operations with varying kernel sizes within each layer, enabling the network to simultaneously learn features at different resolutions. This approach enhances the network’s capacity to represent intricate image patterns by extracting local and global features effectively. Additionally, Inception Network incorporates dimensionality reduction techniques, such as 1 × 1 convolutions, to manage computational complexity while maintaining information flow and improving efficiency. The inception module helped capture multiscale features and reduce computational complexity by utilising parallel convolutional layers with varying filter sizes64. Subsequent iterations of the Inception Network have been introduced that include Inception-V2, Inception-V364, and Inception-ResNet54. Inception Networks have consistently demonstrated competitive performance on image recognition benchmarks such as ImageNet. Due to its versatility and effectiveness, the Inception Network has become influential in computer vision addressing a wide range of problems including image classification and segmentation65,66.
Framework
In Stage 1 of our framework (Fig. 3), we acquire ROV data and store and annotate video data on Squiddle+. In Stage 2, we curate the Deepdive dataset by identifying and extracting different biota, general unknown biology, and non-living objects using HITL object detection from the ROV videos. We cleaned the dataset by removing the images that were blurry or had other irregularities. We also excluded image classes that had less than 15 instances from the final Deepdive dataset to reduce the imbalance in the dataset. In Stage 3, we utilise three prominent pre-trained deep learning models for automated image classification, including variations from ResNet, DenseNet, and Inception Network. The selected models in our framework have been pre-trained on the ImageNet dataset67 that features 1000 classes of objects. In Stage 4, we use the Deepdive dataset to create benchmark results for the multi-class classification task using the respective pre-trained deep learning models. We utilise transfer learning via pre-trained models by replacing the final fully connected layer in the model with a customised classifier layer to accommodate the distinct classes in the Deepdive dataset. We then retain the respective models using the 33 classes of the Deepdive training dataset. We use the validation subset to monitor the model’s performance and prevent overfitting during the training process. We evaluate the classification accuracy using the Receiver Operating Characteristic Curve (ROC), and Area Under Curve (AUC)68 scores on the test set.
Data Records: Deepdive Dataset
The Deepdive dataset (can be accessed from69) contains a zip directory of all the 3994 images featuring deep-sea biota images split into three folders: training, validation and testing using a 60:20:20 split. The three folders contain 33 sub-folders, each representing a different class of images. The dataset for training consists of 2,667 images. The validation subset contains 667 images, and the test set comprises 660 images. We group the images into sub-folders to ease the selection of only specific classes if you only need to train a model for predicting a handful of classes. We kept the sub-folder names the same across each subset that is training, validation and testing, for consistency.
Technical Validation
We preprocess the data to create a coherent set of images that can be utilised directly for classification tasks. Our approach involved carefully selecting and applying various preprocessing techniques. We first isolated the classes of images that had more than 15 instances identified using HITL annotation. This was done to reduce the extreme imbalance in the Deepdive dataset; hence, we ended up with having 33 distinct classes. Figure 4 shows the distribution of all images present in Deepdive dataset. It clearly shows the imbalance in the dataset even after removing approximately 30 classes due to the lack of images for those classes. The level of imbalance in the dataset has a 33:1 ratio of most class instances to least class instances. We were unable to improve this ratio, but we ensured adequate representation of all classes across the three sets by randomly selecting images for each subset. We then standardised the image sizes by resizing them to a consistent resolution of 250 x 250 pixels. This was achieved through bilinear interpolation of the original image to ensure conformity to the new size. We rescaled the pixel values of the images to be in the range [0 to 1] as the colour channels were originally in the range [0 to 255]. This ensures homogeneity and compatibility with the classification architecture.
Usage Notes
In order to ensure the usability of our dataset, we performed testing on the data set for the image classification task. We selected the deep learning image classifiers mentioned previously and created a framework to integrate the Deepdive dataset into this process. We downloaded the dataset from Zenodo and unzipped the files to uncover the full dataset. In the framework (Fig. 3), we loaded the images from the training and validation folders and trained seven different pre-trained models including ResNet-50, ResNet-101, ResNet-152, DenseNet-121, DenseNet-169, Inception-V3, and Inception-ResNet-V2 from Keras library70. We ran 30 independent experimental runs for the training, validation, and testing cycle for each classifier with random initialisation of the final fully-connected layer of the model for each training cycle. This was done to create benchmark results for the dataset in order to guarantee consistent and reproducible outcomes. The model was trained using Tensorflow70 in an Anaconda environment running on a Windows System with an Nvidia 3090 graphical processor, Intel i9 processor and 128 gigabytes of memory.
The results show that the Inception-ResNet model achieved the best classification accuracy of 65% as given in Fig. 5. All the other models had lower accuracies that were below 50%. We further analyse the AUC scores from each of the models given in Table 3. The Inception-ResNet-V2 model had the best prediction performance across each of the classes in the dataset. Figure 7 shows the ROC-AUC plot for a single run of the Incettion-Resnet-V2 model. We did not find a strong correlation between the number of instances present per class and the classification performance. This contradicts conventional machine learning models for class imbalanced datasets, where larger classes have better classification performance. The class Quill (Sea Pens) only had 15 instances in the dataset; however, the model performed exceptionally well classifying it. We further investigated this issue by reviewing the ImageNet dataset67 used in training the ResNet model, and found that the Quill class was present in that dataset. Table 1 highlights all the classes in the ImageNet dataset that are also present in the Deepdive dataset. All of these classes have better classification performance as the model is pre-trained on them and then retrained with the same classes during our retraining of the classification layer. All these classes have an AUC over 0.94, regardless of the number of instances present for these classes. Apart from these special considerations, we find good classification in classes with more images present; this is common to imbalanced datasets, and we see the same trend here. It’s worth noting that there is an exception for the Stalked Erect sponge and Laminar Erect sponge classes, which have an AUC of over 0.98 but less than 20 instances per image. However, we believe that the Arborescent Stumpy sponge class provides sufficient training data for the model to identify these two classes, as they belong to the same sponge class and share many similar features.
Map of Ribbon Reef canyon system including Ribbon Reef 5 taken from71. The black lines in Figure C represent the ROV dives transects for four dives. Figure A and B give the relative position of the ROV dive sites.
We presented a deep learning framework with a human-in-the-loop approach for ROV image analysis, highlighting the effectiveness of Inception-ResNet model. The Deepdive dataset is limited in size when compared to conventional deep learning datasets such as ImageNet67 that feature more than 3.2 million images. However, Deepdive has gone through rigorous quality control through the human-in-the-loop methodology and is comparable to similar underwater image datasets as outlined in the introduction10,18. Furthermore, we also have limited images of some of the classes and have a class-imbalanced dataset. In order to maximise the performance of the classification models, we excluded the classes with fewer than 15 images from the experiments. There is potential for future improvement with data augmentation for handling class imbalance and uncertainty quantification using Bayesian deep learning models.
Code availability
The code used for implementing the framework for classification performance evaluation and data loading can be accessed via the Github repository available at https://github.com/rvdeo/deepdive.
References
Hopley, D. The Geomorphology of the Great Barrier Reef: Quaternary Development of Coral Reefs. Coral reefs and islands (Wiley, 1982).
Hopley, D., Smithers, S. G. & Parnell, K. The Geomorphology of the Great Barrier Reef: Development, Diversity and Change https://www.cambridge.org/core/books/geomorphology-of-the-great-barrier-reef/97860D2EDB0E1EEE59DB1735BA2979A5 (Cambridge University Press, Cambridge, 2007).
De’ath, G., Fabricius, K. E., Sweatman, H. & Puotinen, M. The 27–year decline of coral cover on the Great Barrier Reef and its causes. Proceedings of the National Academy of Sciences 109, 17995–17999, https://doi.org/10.1073/pnas.1208909109 (2012).
Brodie, J. & Waterhouse, J. A critical review of environmental management of the ‘not so Great’ Barrier Reef. Estuarine, Coastal and Shelf Science 104–105, 1–22 (2012).
Lu, D. & Weng, Q. A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing 28, 823–870, https://doi.org/10.1080/01431160600746456 (2007).
Kohler, K. E. & Gill, S. M. Coral Point Count with Excel extensions (CPCe): A Visual Basic program for the determination of coral and substrate coverage using random point count methodology. Computers & Geosciences 32, 1259–1269 (2006).
Trygonis, V. & Sini, M. photoQuad: A dedicated seabed image processing software, and a comparative error analysis of four photoquadrat methods. Journal of Experimental Marine Biology and Ecology 424-425, 99–108 (2012).
Langenkämper, D., Zurowietz, M., Schoening, T. & Nattkemper, T. W. BIIGLE 2.0 - Browsing and Annotating Large Marine Image Collections. Frontiers in Marine Science 4, 83 (2017).
Ariell Friedman, J. M.SQUIDLE+. https://squidle.org.
Schoening, T. et al. Megafauna community assessment of polymetallic-nodule fields with cameras: platform and methodology comparison. Biogeosciences 17, 3115–3133 (2020).
Thiel, H. et al. The large-scale environmental impact experiment DISCOL—reflection and foresight. Deep Sea Research Part II: Topical Studies in Oceanography 48, 3869–3882 (2001).
Wright, R. M. et al. Benthic communities of the lower mesophotic zone on One Tree shelf edge, southern Great Barrier Reef, Australia. Marine and Freshwater Research 74, 1178–1192 (2023).
Pawlik, J. et al. Comparison of recent survey techniques for estimating benthic cover on Caribbean mesophotic reefs. Marine Ecology Progress Series 686, 201–211 (2022).
Bridge, T. C. L. et al. Topography, substratum and benthic macrofaunal relationships on a tropical mesophotic shelf margin, central Great Barrier Reef, Australia. Coral Reefs 30, 143–153, https://doi.org/10.1007/s00338-010-0677-3 (2011).
Sih, T. L. et al. Deep-Reef Fish Communities of the Great Barrier Reef Shelf-Break: Trophic Structure and Habitat Associations. Diversity 11, 26 (2019).
Williams, I. D. et al. Leveraging Automated Image Analysis Tools to Transform Our Capacity to Assess Status and Trends of Coral Reefs. Frontiers in Marine Science 6, 222 (2019).
Matabos, M. et al. Expert, Crowd, Students or Algorithm: who holds the key to deep-sea imagery ‘big data’ processing? Methods in Ecology and Evolution 8, 996–1004 (2017).
Lopez-Vazquez, V., Lopez-Guede, J. M., Chatzievangelou, D. & Aguzzi, J. Deep learning based deep-sea automatic image enhancement and animal species classification. Journal of Big Data 10, 37 (2023).
Bigham, K. T., Vardaro, M. F., Kelley, D. S. & VISIONS Team. Biology Catalog May 2024 Image Release Version 1.0 https://interactiveoceans.washington.edu/biology-catalog/biology-catalog-may-2024-image-release-version-1-0/ (2024).
Bell, K. L. C. et al. Low-Cost, Deep-Sea Imaging and Analysis Tools for Deep-Sea Exploration: A Collaborative Design Study. Frontiers in Marine Science 9, 873700 (2022).
Inc, C. A. Tailored video and image analytics ∣ Tator en. https://www.tator.io/.
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y. & Girshick, R.Detectron2 https://github.com/facebookresearch/detectron2 (2019).
Beijbom, O. et al. Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation. PLOS ONE 10, e0130312 (2015).
Kennedy, B. R. C. et al. The Unknown and the Unexplored: Insights Into the Pacific Deep-Sea Following NOAA CAPSTONE Expeditions. Frontiers in Marine Science 6, 480 (2019).
Zhao, Z.-Q., Zheng, P., Xu, S.-T. & Wu, X. Object Detection With Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems 30, 3212–3232 (2019).
Zou, Z., Chen, K., Shi, Z., Guo, Y. & Ye, J. Object Detection in 20 Years: A Survey. Proceedings of the IEEE 111, 257–276 (2023).
Zhang, R. et al. Survey on Deep Learning-Based Marine Object Detection. Journal of Advanced Transportation 2021, 1–18 (2021).
Moniruzzaman, M., Islam, S. M. S., Bennamoun, M. & Lavery, P. Deep Learning on Underwater Marine Object Detection: A Survey. In Blanc-Talon, J., Penne, R., Philips, W., Popescu, D. & Scheunders, P. (eds.) Advanced Concepts for Intelligent Vision Systems, vol. 10617, 150–160. http://link.springer.com/10.1007/978-3-319-70353-4_13 (Springer International Publishing, Cham, 2017).
Soriano, M., Marcos, S., Saloma, C., Quibilan, M. & Alino, P. Image classification of coral reef components from underwater color video. In MTS/IEEE Oceans 2001. An Ocean Odyssey. Conference Proceedings (IEEE Cat. No.01CH37295), vol. 2, 1008–1013 vol.2 (2001).
Sharma, R., Sankar, S. J., Samanta, S., Sardar, A. A. & Gracious, D. Image analysis of seafloor photographs for estimation of deep-sea minerals. Geo-Marine Letters 30, 617–626, https://doi.org/10.1007/s00367-010-0205-z (2010).
Rimavicius, T. & Gelzinis, A. A Comparison of the Deep Learning Methods for Solving Seafloor Image Classification Task. In Damaševičius, R. & Mikašytė, V. (eds.) Information and Software Technologies, Communications in Computer and Information Science, 442–453 (Springer International Publishing, Cham, 2017).
Zurowietz, M., Langenkämper, D., Hosking, B., Ruhl, H. A. & Nattkemper, T. W. MAIA—A machine learning assisted image annotation method for environmental monitoring and exploration. PLOS ONE 13, e0207498 (2018).
Dawkins, M. et al. An Open-Source Platform for Underwater Image and Video Analytics. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 898–906, https://ieeexplore.ieee.org/document/7926688/figures#figures (2017).
Jian, M. et al. Underwater image processing and analysis: A review. Signal Processing: Image Communication 91, 116088 (2021).
Wu, X. et al. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems 135, 364–381, https://doi.org/10.1016/j.future.2022.05.014 (2022).
Dr. Brendan Brooke & Schmidt Ocean Institute. Schmidt Ocean Institute Expedition Report: Seamounts, Canyons and Reefs of The Coral Sea. Tech. Rep., Zenodo https://doi.org/10.5281/zenodo.7308219 (2022).
Schmidt Ocean Institute. Seamounts, Canyons & Reefs of the Coral Sea. https://thredds.nci.org.au/thredds/catalog/fk1/GA0365_CoralSea_FK200802/Subastian/catalog.html (2020).
Zhang, X., Wang, L., Xie, J. & Zhu, P. Human-in-the-loop image segmentation and annotation. Science China Information Sciences 63, 1–3 (2020).
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T. & Xiao, J. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop https://doi.org/10.48550/arXiv.1506.03365 (2015).
Wang, Z. J. Choi, D., Xu, S. & Yang, D. Putting Humans in the Natural Language Processing Loop: A Survey. In Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing, pages 47–52, https://doi.org/10.48550/arXiv.2103.04044 (2021).
Budd, S., Robinson, E. C. & Kainz, B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Medical Image Analysis 71, 102062 (2021).
Buscombe, D. et al. Human-in-the-Loop Segmentation of Earth Surface Imagery. Earth and Space Science 9, e2021EA002085 (2022).
Pavoni, G. et al. TagLab: AI-assisted annotation for the fast and accurate semantic segmentation of coral reef orthoimages. Journal of field robotics 39, 246–262 (2022).
Althaus, F. et al. A Standardised Vocabulary for Identifying Benthic Biota and Substrata from Underwater Imagery: The CATAMI Classification Scheme. PLOS ONE 10, e0141039 (2015).
Friedman, A.SQUIDLE+. https://squidle.org/.
Li, Z., Liu, F., Yang, W., Peng, S. & Zhou, J. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems 33, 6999–7019 (2021).
Showkat, S. & Qureshi, S. Efficacy of Transfer Learning-based ResNet models in Chest X-ray image classification for detecting COVID-19 Pneumonia. Chemometrics and Intelligent Laboratory Systems 224, 104534 (2022).
Rezende, E., Ruppert, G., Carvalho, T., Ramos, F. & De Geus, P. Malicious Software Classification Using Transfer Learning of ResNet-50 Deep Neural Network. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 1011–1014, http://ieeexplore.ieee.org/document/8260773/ (IEEE, Cancun, Mexico, 2017).
Ebrahimi, A., Luo, S. & Chiong, R. Introducing Transfer Learning to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Images. In 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), 1–6, https://ieeexplore.ieee.org/document/9290616/ (IEEE, Wellington, New Zealand, 2020).
Liu, S., Tian, G. & Xu, Y. A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter. Neurocomputing 338, 191–206 (2019).
Dhillon, A. & Verma, G. K. Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence 9, 85–112, https://doi.org/10.1007/s13748-019-00203-0 (2020).
Zoph, B. et al. Learning Data Augmentation Strategies for Object Detection. In Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M. (eds.) Computer Vision – ECCV 2020, Lecture Notes in Computer Science, 566–583 (Springer International Publishing, Cham, 2020).
Xia, K.-j, Yin, H.-s & Zhang, Y.-d Deep Semantic Segmentation of Kidney and Space-Occupying Lesion Area Based on SCNN and ResNet Models Combined with SIFT-Flow Algorithm. Journal of Medical Systems 43, 2, https://doi.org/10.1007/s10916-018-1116-1 (2018).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the AAAI Conference on Artificial Intelligence 31, https://ojs.aaai.org/index.php/AAAI/article/view/11231 (2017).
Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 107–116 (1998).
Sarwinda, D., Paradisa, R. H., Bustamam, A. & Anggia, P. Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer. Procedia Computer Science 179, 423–431 (2021).
Haque, M. F., Lim, H.-Y. & Kang, D.-S. Object detection based on VGG with ResNet network. In 2019 International Conference on Electronics, Information, and Communication (ICEIC), 1–3 (IEEE, 2019).
Iandola, F. et al. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 (2014).
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q.Densely Connected Convolutional Networks http://arxiv.org/abs/1608.06993 (2018).
Abai, Z. & Rajmalwar, N. Densenet models for tiny imagenet classification. arXiv preprint arXiv:1904.10429 (2019).
Adegun, A. A. & Viriri, S. FCN-based DenseNet framework for automated detection and classification of skin lesions in dermoscopy images. IEEE Access 8, 150377–150396 (2020).
Zhai, S., Shang, D., Wang, S. & Dong, S. DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion. IEEE access 8, 24344–24357 (2020).
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9 (2015).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826 (2016).
Jena, B., Nayak, G. K. & Saxena, S. Convolutional neural network and its pretrained models for image classification and object detection: A survey. Concurrency and Computation: Practice and Experience 34, e6767 (2022).
Khan, A., Sohail, A., Zahoora, U. & Qureshi, A. S. A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review 53, 5455–5516 (2020).
Deng, J. et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255, https://ieeexplore.ieee.org/abstract/document/5206848 (2009).
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997).
Deo, R. et al. Deepdive Dataset (ROV multiclass image classification). Zenodo https://doi.org/10.5281/zenodo.10724993 (2024).
Dillon, J. V. et al. Tensorflow distributions. arXiv preprint arXiv:1711.10604 (2017).
Webster, J. M. et al. Late Pleistocene history of turbidite sedimentation in a submarine canyon off the northern Great Barrier Reef, Australia. Palaeogeography, Palaeoclimatology, Palaeoecology 331-332, 75–89 (2012).
Acknowledgements
The authors acknowledge support from the Australian Research Council through grant IC190100031 for providing the Open Access Fees.
Author information
Authors and Affiliations
Contributions
R.D. contributed to the code development and prepared the original draft of the paper; R.C. contributed to experiment design; R.D., C.M.J., J.M.W., T.S. and R.C. contributed to analysis and editing final draft; K.W., C.Z. and R.D. curated the data for publication. All the authors reviewed the final draft of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Deo, R., John, C.M., Zhang, C. et al. Deepdive: Leveraging Pre-trained Deep Learning for Deep-Sea ROV Biota Identification in the Great Barrier Reef. Sci Data 11, 957 (2024). https://doi.org/10.1038/s41597-024-03766-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-024-03766-3
This article is cited by
-
A holistic air monitoring dataset with complaints and POIs for anomaly detection and interpretability tracing
Scientific Data (2025)
-
Comparative analysis of RNN, LSTM and CNN algorithms for marine data prediction
Journal of Coastal Conservation (2025)









