Table7 Model size, latency, and peak memory on RTX-4090.
Model | Param(MiB) | SeqLen(token) | Latency(ms) | PeakMem(MiB) |
|---|---|---|---|---|
Qwen-7B-Chat | 7721.32 | 128 | 5979.9 | 14,821.1 |
256 | 5777.1 | 14,821.1 | ||
512 | 6225.6 | 14,821.1 | ||
Sentence-BERT | 109.48 | 128 | 2.2 | 13,618.4 |
256 | 2.2 | 13,620.8 | ||
512 | 2.7 | 13,624.2 |