Introduction

Waste generation represents an inevitable part of the manufacturing process. Considering that global production of municipal solid waste is expected to reach 3.40 billion tons by 20501, it poses a threat to the environment and emerges as a pivotal resource for remanufacturing. If inadequately managed, an increase in waste volumes will put a burden on achieving UN Sustainable Development Goals (SDGs)2, such as Good Health and Well-being (SDG3), Clean Water and Sanitation (SDG6), Sustainable Cities and Communities (SDG11), Life Below Water (SDG14), and Life on Land(SDG15). If managed effectively, this surge will reduce resource scarcity3,4, decrease production costs5,6, and minimize energy dependence7. These benefits may be especially appealing to industrial enterprises, as proper waste management has a direct impact on production costs and resource expenses. To highlight the environmental benefits, proper waste management in the industry could reduce municipal solid waste disposal by more than 64%)8. Advancements on this path should strengthen remanufacturing, which is expected to reach an annual value of up to 90 billion euros by 2030, increasing employment by 65,000 new jobs in the EU alone9. To initiate this change, there is an increasing trend toward hardening policies that regulate how industrial enterprises should treat their waste to keep technological development sustainable10. However, although LEAN and other modern philosophies of manufacturing management argue “zero waste” as one of the top priorities in manufacturing-remanufacturing supply chains11, the practice has shown that this goal is unlikely to be achieved12. Instead, companies tend to reduce and reuse waste—while at the end of the manufacturing process, there is a portion of raw or processed materials that need to be sorted for recycling or disposal.

Although the waste sorting industry tends to be automated, there is a portion of waste (commonly at the end of the sorting chain), in most cases, that still has to be sorted manually by human operators. The characteristic of these waste sorting tasks and corresponding workplaces is a limited number of materials or objects that need to be sorted. This distinct industrial waste sorting from the sorting of raw municipal waste, which may require accurate recognition of hundreds of different materials and products. Furthermore, with the expansion of “small batch and customized manufacturing” in developed countries, there is a growing need for flexible production lines that are capable of switching between different products with minimal effort. Considering such a trend, this study is focused on the niche of industrial waste sorting, which requires more generic and adaptable solutions compared to traditional ones that are designed for fixed production lines or municipal waste.

Automating waste sorting processes with robots has proven its potential to increase productivity, lower expenses, and minimize errors while enhancing the overall efficiency of material use and recycling13,14. A typical robotic waste sorting system is shown in Fig. 1a, and it consists of a pick-and-place solution operating atop a conveyor. The robot is guided by a vision system that includes one or more cameras and an algorithm designed to comprehend image content, determine the item’s location, and execute the picking task. In these terms, a critical task affecting the utility of the whole waste sorting systems becomes the ability of computer vision algorithms to recognize various types of waste and the robustness of grippers that need to perform pick-and-place of highly variable objects15.

Fig. 1
Fig. 1
Full size image

Overview of the proposed procedure for generic waste sorting.

This article primarily focuses on the advancements of robotic vision for waste sorting, whose comparative overview is given in Table 1. This paragraph emphasizes a series of challenges that justify the need for the generic procedure proposed in this study. First, the waste sorting problems may be highly specific and heterogeneous (in terms of considered materials and equipment used in Table 1), so there is a need to retrain computer vision algorithms for each specific industrial application. In terms of approaches used, most studies are based on using detection, instance segmentation, and machine learning—while a portion of studies combined these two approaches to obtain better sorting accuracy. Referring to the targeted waste sorting in the small batch and flexible production means that each new switching or extension of production will require additional investments in the development and annotation of new datasets, as well as training of computer vision algorithms, e.g., segmentation or detection. In contrast to the large facilities for municipal waste sorting into a limited number of types of materials (e.g., metal, wood, plastic, glass, and paper)—their further sorting of specific metals (e.g., aluminum, copper, and brass), or specific types of plastic (polypropylene, polyvinyl chloride, polyethylene terephthalate, etc.) represents a challenge from the automation viewpoint. In these terms, the wider application of robotic solutions depends on reducing the costs and complexity of adopting a computer vision module in Fig. 1. This study identifies the segmentation and/or detection steps as a bottleneck since the manual collection and annotation of waste data require a significant amount of time and investment. By eliminating this step, the waste separation could be simplified into one step: The development of deep learning classifiers, which assumes only the separation of specific objects/materials images into corresponding class folders. Accordingly, this study hypothesizes that the barrier to adapting robotic waste sorting in industrial setup could be significantly reduced by enabling versatile robot picking of unknown waste objects.

Table 1 Comparative review of related studies on the topic of computer vision-based waste sorting.

Related work

Previous research approached the challenge of waste sorting as a computer vision-based detection task, relying on combining feature engineering with machine learning algorithms, while the most recent studies have adopted various deep learning models and architectures to address this problem. In16, the authors introduced an automatic flexible sorting system capable of handling objects with high variability in shape, position, and class, avoiding stopping production when new items are introduced. It detects new products on the conveyor and generates online labeled real images used to re-train the deep learning network. Chen et al. designed and built a robot prototype for construction waste recycling, achieving real-time navigation, a deep learning-based detection method (under different illumination and spatial density conditions), and a 3D object pickup strategy for the accurate identification and stable grasping of waste items17. Two deep-learning techniques (convolutional neural network (CNN) and Graph-LSTM) that can recognize waste products (six classes) on a belt conveyor are presented18. A vision-based architecture for the effective sorting of parts based on shape and material properties is proposed in19, introducing a novel deep learning multi-modal approach, in which multiple parallel auto-encoders are used to extract spatio-spectral information from the RGB and multi-spectral sensors and project them in a common latent space. YOLO v8 was used to localize waste along the conveyor belt. Furthermore, a Multispectral Mixed Waste Dataset (MMWD) was produced, containing multi-spectral data of seven plastic and wood waste classes. In20, the authors developed a machine-learning procedure for the recognition of construction and demolition waste fragments from RGB images using three classifiers—CNN, gradient boosting (GB) decision trees, and multilayer perceptron leveraging selected feature extraction, enhancing classification speed and accuracy. To go beyond single-label waste classification, a multi-task learning architecture is proposed, based on a CNN used to simultaneously identify and locate wastes in images21. An e-waste classification model is developed to classify metallic and non-metallic fractions into broad categories of metal, PCB, plastic, and glass, using a feature vector consisting of mean intensity, standard deviation, and image-sharpness extracted from the thermograms22. Gundupalli et al. reported a system for classifying useful recyclables (e.g., iron, paper, plastic, etc.) from municipal solid waste thermal imaging samples23. A new metaheuristic method with deep transfer learning for the detection and classification of industrial waste was presented24. The model has two key phases—waste object recognition (YOLO-v5 object detector with the Harris Hawks Optimization algorithm) and waste object classification (stacked sparse autoencoder model). In25, the authors developed an optimized hybrid deep learning model (combining CNN and Deep Belief Network) for waste classification that boosted the performance to predict waste and classify it with increased accuracy. Jahanbakhshi et al. addressed the problem of classification of carrots based on various shapes (regular and irregular) to manage and control their waste, using improved CNN based on learning a pooling function by combining average pooling and max pooling26. A deep learning object detection network using X-ray images of the internal structure of waste electric and electronic equipment to separate and sort batteries is presented27. As a comparative summary of previous studies on this topic, an overview of their major differences in high-level approaches, algorithms, considered waste types, and datasets is given in Table 1.

In this paragraph, we emphasize a series of challenges that justify the need for the generic procedure proposed in this study. First, the waste sorting problems may be highly specific and heterogeneous (in terms of considered materials and equipment used in Table 1), so there is a frequent need to retrain computer vision algorithms for each specific industrial application. In terms of approaches used, most previous studies are based on using detection, instance segmentation, and machine learning—while a portion of studies combined these two approaches to obtain better sorting accuracy. Referring to the targeted waste sorting in the small batch and flexible production means that each new switching or extension of production will require additional investments in the development and annotation of new datasets, as well as training of computer vision algorithms, e.g., segmentation or detection. In contrast to the large facilities for municipal waste sorting into a limited number of types of materials (e.g., metal, wood, plastic, glass, and paper)—their further sorting of specific metals (e.g., aluminum, copper, and brass), or specific types of plastic (polypropylene, polyvinyl chloride, polyethylene terephthalate, etc.) represents a challenge from the automation viewpoint. In these terms, the wider application of robotic solutions depends on reducing the costs and complexity of adopting a computer vision module in Fig. 1. This study identifies the segmentation and/or detection steps as a bottleneck since the manual collection and annotation of waste data require a significant amount of time and investment. By eliminating this step, the waste separation could be simplified into one step: The development of deep learning classifiers, which assumes only the separation of specific objects/materials images into corresponding class folders. Accordingly, this study hypothesizes that the barrier to adapting robotic waste sorting in industrial setup could be significantly reduced by enabling versatile robot picking of unknown waste objects.

The rest of the paper is organized as follows. The methodology for versatile waste sorting is presented in Sect. 2 “Related work”, followed by experimental setup and results in Sect. 3 “Methods”. Section 4 “Experiments and results” provides a discussion of the results and advantages of the proposed pipeline. The paper ends with concluding remarks and future work perspectives in Sect. 5 “Discussion”.

Methods

In this study, the waste sorting task is split into two sub-tasks: 1) localization of waste objects on a conveyor, and 2) their classification into corresponding classes (Fig. 2). The task of waste object localization is solved using the Segment Anything architecture (SAM), of which five alternatives are considered in this study.

Fig. 2
Fig. 2
Full size image

Architectures of deep learning algorithms used for waste sorting. a) Raw image captured from an industrial camera; b) SAM architecture; c) SAM outputs for a single detected object; d) EfficientSAM architecture; e) FasterSAM architecture; f) Cropped waste object; g) MobileNetV2 architecture; h) Sample image of municipal waste; i) SAM output for the considered sample image; j) Cropped waste objects from the considered sample image.

The base SAM is a zero-shot segmentation model developed on 11 million images and 1.1 billion segmentation masks28. Briefly, it is a foundation model that enables obtaining accurate segmentation masks from input images for a variety of input prompts. The SAM architecture is composed of three core modules: An image encoder, a prompt encoder, and a mask decoder (Fig. 2b). The image encoder generates one-time image embeddings using a mask auto-encoder29and a pretrained Vision Transformer (ViT-H)30. The prompt encoding modules are designed for efficient encoding of various prompt modes. The prompt is information that indicates what to segment in an image, and it can be foreground/background points, bounding box or mask, points, or text. In this study, we used the points prompts that are represented by positional encodings31and added with learned embeddings. By combining the image embedding with prompt encodings, the mask decoder module generates segmentation masks by using prompt self-attention32and cross-attention32 in two directions (from prompt to image embedding and back). The advantage of the SAM is that one can run the image encoder only once, and then the model can be prompted multiple times using the same image embeddings (which can provide masks for new prompts in ~ 50 ms). This characteristic makes the SAM superior for solving complex vision tasks that may need to be performed iteratively or in highly variable conditions (e.g., image annotation or perception in unseen environments).

The Fast Segment Anything (FastSAM) is proposed to address the bottleneck of the computationally expensive transformer branch, which limits real-industry SAM applications. This is achieved by decoupling the segment anything task into all-instance segmentation (input size 1024) and subsequent prompt-guided selection regions of interest33The first step is done using the YOLOv8-seg34object detector, which itself contains the instance segmentation branch based on the YOLACT35. The FastSAM achieved performances comparable with the original SAM, at 50 × higher run-time speed and using only 2% of the SA-1B Dataset (https://ai.meta.com/datasets/segment-anything) proposed with the original SAM architecture.

The Faster Segment Anything (FasterSAM) is proposed with the primary aim of bringing SAM to mobile devices, which is the reason why it is also termed MobileSAMby authors36. MobileSAM study starts from the recommendation of the original SAM paper that the default ViT-H encoder (632 M parameters) could be replaced with a retrained lightweight alternative (e.g., ViT-L 307 M, and ViT-B 86 M parameters) while trying to reduce computational power needed to perform to a single GPU. Specifically, 256 A100 GPUs and 68 h are needed to train the ViT-H encoder, and 128 GPUs and multiple days to train and replace with ViT-L and ViT-B—which represents a barrier for the majority of research labs to participate and contribute to the topic. This is achieved by decoupling the image encoded and mask decoder, by first distilling the knowledge from the heavy image encoder (ViT-H) to a lightweight one (ViT-Tiny)37 and fine-tuning the original mask decoder to better align with the distilled image encoder. The MobileSAm is 60 × smaller than the original SAM (with comparable performances) and 5 × faster and 7 × smaller compared to the concurrent FastSAM.

MobileSAMv2, as indicated, represents an improved version of MobileSAM. By adopting the YOLOv8 for efficient detection with bounding boxes, authors replaced the default grid-search point prompts with object-aware box prompts38. Overall, the authors reported a 16 × speed-up in performing the segment anything task while maintaining performance competitive to baseline SAM (3.6% performance boost for zero-shot object proposal on the LVIS dataset).

EfficientSAMis an extended work of the original SAM, made by leveraging SAM-leveraged masked image pretraining (SAMI) for producing pertained lightweight ViT backbones for segmenting anything task39in combination with the MAE pre-training method29. Briefly, the authors used the SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs and finetuned the models on SA-1B for segment anything task. For the segment anything task, the EfficientSAMs outperformed MobileSAM and FastSAM by a large margin (~ 4 AP), while having comparable complexity.

As this study focuses on developing the procedure for versatile robot picking of unknown or highly damaged/deformed waste objects, the SAM is used to improve: 1) the development of the object classification dataset, and 2) the localization of waste objects that need to be picked by robots. By using the SAM, data collection and annotation for classification is reduced to the acquisition of a sample video and split of the extracted object images into corresponding class folders. Besides the expected list of classes, we added one more “Unknown” class—where the robot needs to separate unseen waste objects that, at the moment, are not considered for recycling. In this way, unknown waste objects could be segregated from the recycling process or incorporated into the classification model as a separate class afterward.

By using the SAM output instance masks (Fig. 2c), each object is cropped, its background is removed, and it is forwarded to the MobileNetV2 classifier (Fig. 2d and 2e). The MobileNetV2 is a compact architecture based on an inverted residual structure, where the input and output of the residual block are thin bottleneck layers, while the intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity40. Besides MobileNetV2, as alternative classification architectures, we also considered VGG1941, Dense-Net42, Squeeze-Net43, Inception-v344, and ResNet45. All classification models were pretrained on the ImageNet dataset46and fine-tuned using the PyTorch framework. To increase the robustness of considered models, we performed an online augmentation (random rotation ± 30°, random flip, random crop, and Gaussian noise) with a probability of 20%—while the dataset was randomly split into training (70%), validation (15%), and test (15%) datasets. The training was done using the Adam optimization algorithm47, with the cross-entropy loss function and the initial learning rate of 1e-4 (which was decreased by a factor of 0.1 every 7 epochs).

As indicated in Fig. 1, one of the classes is marked as “Unknown”—because in waste sorting it is common to have occurrences of unexpected materials. Moreover, the Unknown class is used to enable the extension of the dataset and corresponding classifier once the frequency of its appearance exceeds a certain limit. At that moment, there will already be a sufficient number of unsorted objects—which can instantly be used for new training. The workflow of this continuous improvement is illustrated in the pseudo-code shown in Table 2.

Table 2 Pseudo-code of the proposed generic procedure for versatile waste sorting.

Experiments and results

All the implementations were done by using the Python programming language, along with the PyTorch library. All the computations were performed on the Lambda workstation with the AMD Threadripper 3970X (32 cores, 3.79 GHz processor), 128 GB RAM, and two Titan RTX (24 GB) + NVLink GPUs.

We gratefully acknowledge the contributions of following public data sources that were used for developing our dataset and this particular figure present in our manuscript: ZeroWaste-f48, RoboFlow Waste Conveyor49, WaRP-D50, ANTLab51, Kaggle Waste Materials classification Data dataset52, and FloW53 dataset.

To assess the proposed procedure, we developed four datasets corresponding to different use cases: a) Floating waste, b) Municipal waste, c) E-waste, and d) Smart bin waste. The Municipal waste dataset (1500 images) was developed by combining three public datasets. We randomly selected 1000 (out of 4503 images) from the ZeroWaste-f dataset, containing cardboard, soft plastic, rigid plastic, and metal)48, 300 out of 1518 images from the RoboFlow Waste conveyor dataset (Purdue University) containing cardboard, glass, metal, paper, and plastic49, and 200 out of 522 images from the WaRP-D dataset containing various types of bottles (glass, plastic), cardboard, and canisters (plastic, cans)50. From each image, we extracted objects using the SAM algorithm—which were further manually stratified into four classes corresponding to the following materials: Plastic, paper, metal, and glass (the rest materials were classified as “Unknown” class). The E-waste dataset (2100 images) was developed using images collected by authors from an e-recycling company in Serbia (Fig. 3). The company operators manually sorted and provided 700 pieces (of varying sizes) of each material, which were further imaged and manually split into the three class groups (aluminum, copper, and brass). The Smart Bins dataset (6587 images) was developed by combining two public datasets. We randomly selected 4600 out of 5199 images from the ANTLab (Politecnico di Milano) Smart Waste Bin dataset containing: Glass, metal, paper, and plastic objects (the remaining images were grouped into the “Unknown” class)51and 1987 images from the Kaggle Waste Materials classification Data dataset52. The Floating waste dataset (1600 images) was developed by combining two public datasets. From the FloW dataset53and RoboFlow floating waste dataset54, we extracted 1000 images containing plastic bottles, and 600 images containing objects classified as “Unknown”. The metrics selected for the evaluation and comparison of developed models included: \(Accuracy= \frac{{T}_{p}+{T}_{n}}{{T}_{p}+{F}_{p}+{T}_{n}+{F}_{n}}\) , \(Precision=\frac{{T}_{p}}{{T}_{p}+{F}_{p}}\) , \(Recal=\frac{{T}_{p}}{{T}_{p}+{F}_{n}}\), and \(F1 score=2\frac{Recall*Precision}{Recall+Precision}\), where Tp are true positive, Tn true negative, Fp false positive, and Fn false negative classifications. The obtained results are given in Table 3. In Table 4, we performed a comparison of the proposed pipeline with the existing object detection (using YOLOv11 algorithm55and instance segmentation (using MaskRCNN algorithms56) approaches. The classification accuracy metrics shown in Table 4 were obtained by manual assessment of YOLOv11 and MaskRCNN output objects and classes by human experts.

Fig. 3
Fig. 3
Full size image

Considered use cases from the waste sorting industry.

Table 3 Performances of the developed deep learning algorithms.
Table 4 Performances of alternative approaches based on object detection and instance segmentation algorithms.

Discussion

In terms of scene complexity and variation in object appearance, the E-waste dataset is most determined. It contains only three materials/classes, and they are imaged on the conveyor in laboratory conditions—resulting in accurate results (97%). We emphasize that this dataset is the most relevant representation of the considered waste classification in small batch manufacturing companies. The other three datasets were created by using public datasets. Although images were randomly selected, and waste object extraction from images was done automatically using the SAM, we preferred to exclude noisy or low-quality images from our datasets. The obtained accuracies were between 86 and 97% in all three cases. For the Smart bin waste, the limited number of considered waste objects (glass, metal, paper, and plastic) makes it similar in terms of complexity and performance to the E-waste dataset. Municipal waste may be considered the most complex and heterogeneous in terms of object appearance. The complexity also comes from the fact that many objects contain multiple materials (e.g., plastic bottles have paper etiquette). In these cases, the accuracy of the classifier is determined by the training strategy and how the corresponding dataset is developed. We emphasize that the lowest performances were obtained on the Floating waste dataset, which may be justified by the fact that images were captured from different and sub-optimal viewpoints ranging from above to horizontal viewpoint (while objects were far away and partially visible from water) (Fig. 4).

Fig. 4
Fig. 4
Full size image

Comparison of SAM and FastSAM outputs for different waste types.

As mentioned in the introduction, in contrast to municipal waste sorting which has undetermined variations in the number and appearance of object classes, in the manufacturing industry there are commonly certain cases of materials/objects to be classified. Even in the waste sorting industry, at the end of the sorting chain (e.g., sorting of materials to metal, wood, plastic, etc.), there is a need to perform final sorting on a certain number of classes (e.g., aluminum, copper, and steel). On the one hand, this significantly simplifies waste sorting, while on the other hand, recognition of various types of materials requires constant retraining of algorithms listed in Table 1. As indicated in48, popular detection, segmentation, and instance segmentation algorithms (such as Mark-RCNN, TridentNet, and DeepLabV3) struggle to generalize on waste datasets that include in-the-world datasets. This is reasonable, as at the deployment stage they will be constantly exposed to unseen objects, as waste objects may be highly damaged and unrecognizable compared to referent ones used at the training stage. As a solution to this challenge, some studies used the blob detection of foreground objects (on the moving conveyor)16. However, this approach fails to separate overlapping objects (e.g., municipal waste in Fig. 3), which is the fundamental problem in waste separation. Our experiments indicate that the SAM fits this challenge well, especially in manufacturing industry-related problems (e.g., E-waste in Fig. 3)18. developed a real-time deep learning model of CNN & GLSTM for solid waste class detection and prediction, achieving high accuracy in real-world situations. Even though a generic, procedure for simultaneous localization and recognition of waste types proposed by21requires object separation in images, due to multi-label classification—each image in the dataset has bounding box annotations and multiple labels. The two-stage methods that combine detection and classification imply that both detection and classification modules need to be retrained in case of a new object or material class24,27. This increases the complexity of the entire process due to the necessity of additional annotation, which is intricate and time-consuming. Finally, when it comes to approaches that use only classification for computer vision-based waste sorting, they fall short in distinguishing waste objects in cases of multiple overlapping objects in a scene25,26.

Compared to the traditional approaches (object detection, object segmentation, and instance segmentation approaches)—the advantage of using the SAM + MobileNetV2 pipeline in waste sorting is two-fold. Besides easing object detection and separation, it is also very suitable for speeding up data preparation during the development of image classifiers. We report these two features as a key advantage compared to the conventional waste sorting pipelines listed in Table 1 and Table 4. Specifically, the results of our experiments in Table 4 indicate that the proposed approach slightly outperforms conventional object detection and instance segmentation algorithms in terms of accuracy. However, we underline that retraining existing artificial intelligence systems assumes significant investments in developing new datasets and very low efforts in re-running an existing code for model fine-tuning. Considering that, and the specificity of small batch production systems, the major challenge of the proposed pipeline is a significant reduction of costs and complexity of developing/switching to new waste sorting use cases. In our experiments, the development of novel datasets consumed only a few hours and no use of annotation tools. This is an important distinction from the previous approaches, which required the development of datasets for object detection and segmentation—which is a time-consuming process. To ease this process and continuous improvement of the system, we recommend incorporating one extra (“Unknown”) class, which enables the system to separate unseen waste objects that at the moment are not considered for recycling or could not be accurately classified. In this way, unknown waste objects could be segregated from recycling or forwarded to human operators for manual sorting. At the same time, if the frequency of a specific unseen class increases above some threshold, its images could be incorporated into the fine-tuning of the classification model as a separate class. This feature is useful in situations where the occurrence of waste material varies over time, or where recycling rules may change due to some disruption in supply chain or regulation policies.

In summary, while the described procedure is developed to be generic and has notable advantages, it is important to be aware of potential limitations. First, the validation of the proposed procedure is limited in the context of its specific applications focused on waste sorting in the manufacturing industry. Even more, this study is focused on a specific niche, defined as “waste sorting in small batch and flexible manufacturing”. Considering results obtained in the four use cases, there is a correlation between lowering performances of the procedure with increasing complexity and diversity of datasets. This is related to the abilities of the SAM itself, as it is trained on 11 million natural images, out of which neglectable portions were waste images. Referring to this, future research work may be directed toward fine-tuning the SAM to better separate municipal waste objects.

At this stage of development, we conclude that the proposed procedure based on combining the SAM model and MobileNetV2 classifier is suitable for the considered small batch manufacturing waste sorting, which is confirmed on the e-recycling and smart bin waste datasets. Implementing effective waste management practices in small batch manufacturing enterprises will contribute to local and global sustainability efforts. Locally, the proposed solution will increase resource efficiency and reduce costs associated with manufacturing byproducts disposal. Furthermore, an automated waste selection procedure will reduce the risk of injury by allocating human operators to other production sectors, increasing overall enterprise productivity. As a result, this will enhance the circular economy within micro, small, and medium-sized enterprises whose liquidity is considered critical to the creation of sustainable societies. Globally, these advances will positively impact shifts in megatrends, such as rapid urbanization, emerging markets, climate change, and resource depletion.

Conclusion

Waste generation is an inevitable byproduct of manufacturing processes that presents a substantial challenge to environmental well-being. Recent advancements in robotic waste handling offer promising solutions for the given problem, especially in the context of small batch and customized manufacturing or disassembly. In this study, we proposed a generic deep learning-based approach for versatile waste sorting and evaluated it using four different use cases that included floating waste, municipal waste, e-waste, and smart bins. The experimental results showed that the proposed fusion of SAM and MobileNetV2 classifier provides a series of advantages over previous studies based on conventional object detection and object segmentation algorithms. Specifically, our generic procedure simplifies separation into one step: The development of deep learning classifiers, which assumes only the separation of specific objects/materials images into corresponding class folders (in contrast to the time-consuming annotation of bounding boxes and semantic masks). The obtained sorting accuracy ranged from 86% on the heterogeneous floating waste dataset, to 97% on the e-waste that is developed in this study as a representative use case of industrial waste sorting. It is concluded that the proposed approach could be used to facilitate the automation process in the waste industry, increase productivity, lower expenses, and minimize errors in the robotic sorting ofindustrial waste57.