Table 2 MAP results of different methods (TVGraz).
From: Cross-modal semantic autoencoder with embedding consensus
\(\hbox {R}=40\) | Image-text | Text-image | Average | \(\hbox {R}=\hbox {all}\) | Image-text | Text-image | Average |
---|---|---|---|---|---|---|---|
CCA | 0.629 | 0.624 | 0.627 | CCA | 0.612 | 0.603 | 0.619 |
BLM | 0.637 | 0.625 | 0.634 | BLM | 0.623 | 0.618 | 0.626 |
LCFS | 0.647 | 0.647 | 0.651 | LCFS | 0.637 | 0.625 | 0.634 |
LGCFL | 0.658 | 0.641 | 0.653 | LGCFL | 0.649 | 0.636 | 0.641 |
JFSSL | 0.654 | 0.645 | 0.656 | JFSSL | 0.654 | 0.649 | 0.657 |
CSAEC | 0.672 | 0.653 | 0.671 | CSAEC | 0.663 | 0.659 | 0.674 |