Scientific Reports

Table 4 Parameters of the pre-trained model.

From: A multi-modal sarcasm detection model based on cue learning

Name	Quantity/Content
Image Encoder Architecture	ViT
Input Image Resolution	224*224
Image Block Size	16*16
Image Encoder Layers	24
Image Encoder Dimension	1024
Image Encoder Heads	16
Text Encoder Layers	12
Text Encoder Dimension	768
Text Encoder Vocabulary	49408

Back to article page

Search

Advanced search

Quick links