Table7  Model size, latency, and peak memory on RTX-4090.

From: Leveraging large language models and embedding representations for enhanced word similarity computation

Model

Param(MiB)

SeqLen(token)

Latency(ms)

PeakMem(MiB)

Qwen-7B-Chat

7721.32

128

5979.9

14,821.1

256

5777.1

14,821.1

512

6225.6

14,821.1

Sentence-BERT

109.48

128

2.2

13,618.4

256

2.2

13,620.8

512

2.7

13,624.2