Table 2 Averaged normalized HV and IGD values of synthetic tasks (upper) and RE tasks (lower) in the Off-MOO benchmark, where the best and runner-up results on each task are highlighted by bold and underlined numbers.

From: Learning design-score manifold to guide diffusion models for offline optimization

 

ZDT(n=2)

OmniTest(n=2)

DTLZ(n=3)

Mean Rank

 

Avg. HV(↑)

Avg. IGD(↓)

Avg. HV(↑)

Avg. IGD(↓)

Avg. HV(↑)

Avg. IGD(↓)

HV Rank(↓)

IGD Rank(↓)

\({{\mathcal{D}}}_{\,\text{train}}^{\text{(best)}\,}\)(Preferred)

1.0 (1.118)

1.0 (0.0)

1.0 (1.056)

1.0 (0.0)

1.0 (1.098)

1.0 (0.0)

/

/

MM-NSGA2(k=1)

0.888 ± 0.013

2.555 ± 0.118

1.044 ± 0.010

0.454 ± 0.129

0.987 ± 0.012

0.635 ± 0.048

8.3 / 10

6.7 / 10

MO-DDOM (k=1)

0.948 ± 0.009

1.674 ± 0.150

0.983 ± 0.002

0.827 ± 0.046

0.932 ± 0.014

0.795 ± 0.063

9.3 / 10

8.0 / 10

ManGO (k=1)

1.097 ± 0.004

0.729 ± 0.064

1.050 ± 0.002

0.399 ± 0.068

0.971 ± 0.016

0.739 ± 0.104

6.0 / 10

6.0 / 10

ManGO+Self-IS (k=1)

1.099 ± 0.005

0.702 ± 0.065

1.050 ± 0.001

0.359 ± 0.017

1.053 ± 0.015

0.250 ± 0.084

4.3 / 10

3.7 / 10

MM-MOBO

0.963 ± 0.007

4.723 ± 0.164

1.056 ± 0.000

0.206 ± 0.019

1.075 ± 0.000

0.362 ± 0.016

4.0 / 10

6.0 / 10

ParetoFlow

1.000 ± 0.008

2.867 ± 0.405

0.953 ± 0.057

1.523 ± 0.567

0.998 ± 0.009

0.672 ± 0.115

7.7 / 10

8.3 / 10

MM-NSGA2 (k=256)

1.055 ± 0.003

3.592 ± 0.044

1.046 ± 0.002

1.008 ± 0.019

1.086 ± 0.000

0.752 ± 0.016

4.0 / 10

9.0 / 10

MO-DDOM (k=256)

0.981 ± 0.006

1.052 ± 0.144

1.033 ± 0.001

0.270 ± 0.010

1.054 ± 0.006

0.255 ± 0.044

6.7 / 10

4.3 / 10

ManGO (k=256)

1.107 ± 0.002

0.420 ± 0.030

1.051 ± 0.002

0.118 ± 0.015

1.066 ± 0.003

0.172 ± 0.013

2.7 / 10

1.7 / 10

ManGO+Self-IS (k=256)

1.106 ± 0.002

0.445 ± 0.043

1.052 ± 0.000

0.094 ± 0.002

1.079 ± 0.003

0.123 ± 0.033

2.0 / 10

1.3 / 10

 

RE(n=2)

RE(n=3)

RE(n=4)

Mean Rank

 

Avg. HV(↑)

Avg. IGD(↓)

Avg. HV(↑)

Avg. IGD(↓)

Avg. HV(↑)

Avg. IGD(↓)

HV Rank(↓)

IGD Rank(↓)

\({{\mathcal{D}}}_{\,\text{train}}^{\text{(best)}\,}\)(Preferred)

1.0 (1.037)

1.0 (0.0)

1.0 (1.082)

1.0 (0.0)

1.0 (1.310)

1.0 (0.0)

/

/

MM-NSGA2(k=1)

1.016 ± 0.004

56.958 ± 12.159

0.935 ± 0.007

2.979 ± 0.159

0.780 ± 0.009

1.504 ± 0.060

9.3 / 10

9.7 / 10

MO-DDOM (k=1)

1.010 ± 0.012

2.261 ± 0.220

1.046 ± 0.002

0.579 ± 0.020

1.058 ± 0.003

0.845 ± 0.008

8.7 / 10

4.7 / 10

ManGO (k=1)

1.024 ± 0.004

6.809 ± 0.740

1.051 ± 0.011

0.782 ± 0.083

1.234 ± 0.009

0.421 ± 0.018

5.7 / 10

6.0 / 10

ManGO+Self-IS (k=1)

1.022 ± 0.004

4.717 ± 2.081

1.066 ± 0.005

0.588 ± 0.027

1.240 ± 0.016

0.304 ± 0.020

4.7 / 10

4.3 / 10

MM-MOBO

1.027 ± 0.002

1.156 ± 0.133

1.071 ± 0.001

0.912 ± 0.093

1.123 ± 0.009

0.816 ± 0.029

3.7 / 10

5.0 / 10

ParetoFlow

1.017 ± 0.004

12.403 ± 0.052

1.010 ± 0.004

1.806 ± 0.024

0.668 ± 0.001

1.887 ± 0.006

9.0 / 10

8.7 / 10

MM-NSGA2 (k=256)

1.034 ± 0.000

98.945 ± 0.011

1.069 ± 0.001

2.160 ± 0.029

1.249 ± 0.012

0.320 ± 0.052

2.0 / 10

6.7 / 10

MO-DDOM (k=256)

1.020 ± 0.001

1.476 ± 0.031

1.056 ± 0.001

0.552 ± 0.007

1.073 ± 0.004

0.808 ± 0.008

6.7 / 10

3.3 / 10

ManGO (k=256)

1.026 ± 0.002

3.373 ± 0.428

1.063 ± 0.005

0.611 ± 0.026

1.248 ± 0.006

0.388 ± 0.014

4.0 / 10

4.7 / 10

ManGO+Self-IS (k=256)

1.028 ± 0.001

2.264 ± 0.237

1.072 ± 0.002

0.498 ± 0.013

1.264 ± 0.002

0.234 ± 0.020

1.3 / 10

2.0 / 10

  1. Higher HV values indicate better performance, while lower IGD values are preferred. \({{\mathcal{D}}}_{\,\text{train}}^{\text{(best)}\,}\)(Preferred) denotes the best (preferred) HV/IGD in the offline training dataset. For compact presentation, reported numbers represent tasks’ performance averaged by the objective number n. The sets of ZDT(n = 2), OmniTest (n = 2), and DTLZ (n = 3) consist of 5 ZDT tasks [41], 1 OmniTest task [52], and 2 DTLZ tasks [42], respectively. The sets of RE (n = 2), RE (n = 3), and RE(n = 4) comprise 5, 7, and 2 real-world application tasks [43] with n = 2, 3, and 4, respectively. Note that each task’s results are normalized by the best HV and IGD of its training dataset, and RE (n = 2) presents higher averaged IGD values because the RE22 task has ten times more IGD value than other tasks. Note that MO-DDOM represents a standard conditional diffusion-based baseline method.