Table 1 DDI Results for Different Models for \(\theta \in \{50, 80, 90, 95, 99\}\) on the HumanEval dataset. \(R^2\) indicates exponential fit quality: Excellent (\(R^2 \ge 0.9\)), Good (\(0.7 \le R^2 < 0.9\)), Poor (\(R^2 < 0.7\)). Models with \(\lambda = \text {None}\) had insufficient data points for exponential fitting after filtering zero effectiveness values.

Model	\(E_0\)	\(\lambda\)	\(A_0\)	\(t_\theta\)	\(R^2\)
claude-3–7-sonnet-20250219	93.902	None	100.00	[]	None
codegemma:7b	51.219	0.9309	66.463	[1, 2, 3, 4, 5]	Excellent
codellama:7b	21.341	0.2467	45.122	[3, 7, 10, 13, 19]	Poor
codestral:22b	58.537	0.3388	89.024	[3, 5, 7, 9, 14]	Good
deepseek-coder-v2:16b	71.951	0.9692	84.146	[1, 2, 3, 4, 5]	Excellent
deepseek-coder:6.7b	45.732	0.4737	74.390	[2, 4, 5, 7, 10]	Excellent
devstral:24b	84.146	0.6438	94.512	[2, 3, 4, 5, 8]	Excellent
gemma2:9b	59.146	0.7632	76.219	[1, 3, 4, 4, 7]	Excellent
gpt-3.5-turbo	73.781	1.3297	82.317	[1, 2, 2, 3, 4]	Excellent
gpt-3.5-turbo-1106	70.732	0.7553	85.976	[1, 3, 4, 4, 7]	Excellent
gpt-4–1106-preview	90.244	0.7619	96.951	[1, 3, 4, 4, 7]	Poor
granite3.3:8b	68.902	0.9482	82.317	[1, 2, 3, 4, 5]	Excellent
llama2:7b	3.659	0.1185	10.976	[6, 14, 20, 26, 39]	Poor
llama3.1:8b	56.707	1.1142	72.561	[1, 2, 3, 3, 5]	Excellent
mistral:instruct	29.878	0.5291	54.268	[2, 4, 5, 6, 9]	Excellent
phi4-reasoning:14b	59.146	0.6052	81.098	[2, 3, 4, 5, 8]	Excellent
phi4:14b	83.537	0.7680	93.293	[1, 3, 3, 4, 6]	Excellent
qwen2.5-coder	76.219	0.4624	94.159	[2, 4, 5, 7, 10]	Excellent

Quick links

Search