Table 3 Real-world 3D classification on ModelNet40

From: Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders

Model

Input Points

Without Voting

With Voting

PointNet78

1K

89.20

None

PointCNN98

1K

92.20

None

Transformer60

1K

91.40

None

[FT] Point-BERT60

1K

92.85

93.20

[FT] Point-MAE51

1K

93.20

93.80

[FT] Point-M2AE52

1K

93.40

94.0

[FT] I2P-MAE93

1K

93.70

94.10

ViT3D[Proposed]

1K

93.97

94.41

[Improvements]

1K

+0.17

+0.31