Fig. 9: IMPFN structure diagram.
From: Multimodal prototype fusion network for paper-cut image classification

Once the text and image prototypical have been obtained, the two are directly fused to create class prototype. Once the image has been extracted using CLIP features, the similarity is calculated with the class prototype, thereby obtaining the classification result. This method allows for the efficient classification of previously unseen classes, thus complementing the experimental dataset.