Fig. 3 | Scientific Reports

Fig. 3

From: Vision language models versus machine learning models performance on polyp detection and classification in colonoscopy images

Fig. 3

Tile-level importance analysis of GPT-4.1 and GPT-4 polyp detection using TiLense. Evaluation of GPT-4.1 and GPT-4 for polyp detection using TiLense, focusing on tile-level importance. The method includes five runs with vision-language models (VLMs) on original and masked images, using 9 masked tiles per image. Each tile receives an importance score from 0 to 5, indicated by a color gradient from white to red, where red denotes a tile whose removal alters the base answer significantly. A reference answer for each image is established, and deviations are scored as 1 point. The final answer was considered by voting among five answers. Panels (a–e) show tile-level predictions across image conditions: standard image without polyp (b), standard image with polyp (c), challenging image without polyp and poor preparation (d), and challenging image with hard-to-see polyp (e).

Back to article page