Table 9 Comparison with State-Of-the-Art Methods

From: Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders

Model

MAIN POINTS

Masking

Semantic

ShapeLLM100

Cross Model Framework

Yes

No

OmniVec101

3D object understanding

Yes

Yes

GPSFormer102

Global Perception and Local Structure Fitting-based Transformer

No

No

TripoSR103

Transformer architecture for fast feed-forward 3D generation

No

No

Geometry104

Geometry-biased attention mechanism

Yes

Yes

UniScene105

Multi-camera unified pre-training framework

Yes

No

Proposed[Vit3D]

Robust multi-scale MAE prior training architecture

Yes

Yes