Fig. 1: Overall architecture for clustering.
From: Toward enhanced unsupervised clustering of 20th century Korean paintings via multimodal features

The proposed framework extracts complementary features (RGB, HSV, GLCM, and CLIP) from each image, concatenates them into a unified representation, and applies dimensionality reduction (t-SNE) followed by K-means clustering to group visually and semantically similar images.