Scientific Reports

Table7 Model size, latency, and peak memory on RTX-4090.

From: Leveraging large language models and embedding representations for enhanced word similarity computation

Model	Param(MiB)	SeqLen(token)	Latency(ms)	PeakMem(MiB)
Qwen-7B-Chat	7721.32	128	5979.9	14,821.1
		256	5777.1	14,821.1
		512	6225.6	14,821.1
Sentence-BERT	109.48	128	2.2	13,618.4
		256	2.2	13,620.8
		512	2.7	13,624.2

Back to article page

Search

Advanced search

Quick links