Table 3 Quantitative performance metrics of the fine-tuned SAM 2 model across various datasets and classes, showing Dice coefficients for different training data scales (50, 100, 200, and 400 samples per class) with 10 prompt points. Comparisons with prior SOTA methods are highlighted in the last column with the deltas in parentheses, improvements in green and declines in red.

From: A fine-tuned foundational model SurgiSAM2 for surgical video anatomy segmentation and detection

Datasets and their organ classes (n = number of examples in validation, test subsets)

Training data scale

Prior SOTA score

Prior SOTA model

MedSAM (vit-b)

Test subset with 1 prompt point)

Test subset with 10 prompt points (delta)

50/class

100/class

200/class

400/class

Number of point prompts (10)

CholecSeg8k (4548, 3611)

 Abdominal wall (951, 943)

0.97

0.97

0.97

0.97

0.84

DynUnet33

0.30

0.88

0.96 (+ 0.12)

 Fat (720, 640)

0.94

0.95

0.96

0.97

0.91

U-net +  + 33

0.21

0.89

0.96 (+ 0.05)

 Gallbladder (720, 537)

0.90

0.90

0.91

0.91

 

Segmenter34

0.41

0.90

0.91

 Gastrointestinal tract (557, 668)

0.96

0.95

0.96

0.96

0.35

UNETR33

0.66

0.88

0.89 (+ 0.54)

 Liver (1600, 640)

0.93

0.93

0.93

0.94

0.98

uNet/uNet +  + /DeepLABv3 +35

0.47

0.91

0.96 (− 0.02)

Weighted mean Dice coefficient for dataset (for tissue classes)

        

0.94

Dresden (621, 851)

 Abdominal wall† (56, 114)

0.93

0.92

0.93

0.90

0.91

SegFormer36

0.48

0.56

0.81 (− 0.10)

 Colon (52, 121)

0.89

0.90

0.89

0.91

0.79

DeepLABv336

0.31

0.83

0.92 (+ 0.13)

 Inferior mesenteric artery (61, 44)

0.77

0.78

0.77

0.79

0.6

SegFormer36

0.25

0.77

0.86 (+ 0.26)

 Intestinal veins (49, 52)

0.74

0.81

0.78

0.78

0.65

SegFormer36

0.20

0.72

0.74 (+ 0.09)

 Liver† (83, 81)

0.93

0.93

0.93

0.94

0.83

SegFormer36

0.42

0.88

0.96 (+ 0.13)

 Pancreas (42, 51)

0.84

0.87

0.86

0.88

0.47

SegFormer36

0.22

0.78

0.81 (+ 0.34)

 Small intestine (53, 108)

0.95

0.95

0.95

0.96

0.89

SegFormer36

0.31

0.90

0.95 (+ 0.06)

 Spleen (50, 14)

0.97

0.96

0.97

0.97

0.85

SegFormer36

0.40

0.87

0.91 (+ 0.06)

Stomach (78, 129)

0.90

0.91

0.90

0.93

0.75

SegFormer36

0.36

0.91

0.91 (+ 0.16)

 Ureter (43, 71)

0.50

0.54

0.57

0.54

0.58

SegFormer36

0.28

0.68

0.75 (+ 0.17)

 Vesicular glands (54, 66)

0.76

0.80

0.80

0.78

0.43

SegFormer36

0.20

0.63

0.74 (+ 0.31)

Weighted mean Dice coefficient for dataset (for tissue classes)

        

0.89

UreterUD (115, 114)

 Uterine artery (50, 22)

0.89

0.88

0.90

0.89

0.92

U-Net23

0.19

0.79

0.86 (− 0.06)

 Nerve (30, 38)

0.71

0.72

0.78

0.78

0.90

U-Net23

0.23

0.77

0.76 (− 0.14)

 Ureter (35, 54)

0.86

0.87

0.86

0.86

0.89

U-Net23

0.36

0.90

0.91 (+ 0.02)

Weighted mean Dice coefficient for dataset (for tissue classes)

        

0.85

Endoscapes (226, 183)

Only single metric reported

  

 Cystic artery (34, 47)

0.66

0.71

0.73

0.72

0.74

Mask2Former24

0.32

0.70

0.75 (+ 0.01)

 Cystic duct (65, 53)

0.72

0.77

0.75

0.79

0.74

Mask2Former24

0.35

0.55

0.74 (0.00)

 Cystic plate (29, 18)

0.72

0.77

0.79

0.75

0.74

Mask2Former24

0.38

0.59

0.74 (0.00)

 Gallbladder† (73, 58)

0.84

0.83

0.85

0.84

0.74

Mask2Former24

0.33

0.58

0.80 (+ 0.06)

 Hepatocystic triangle (25, 7)

0.64

0.67

0.67

0.62

0.74

Mask2Former24

0.34

0.78 (+ 0.04)

0.66

Weighted mean Dice coefficient for dataset (for tissue classes)

        

0.75

m2caiSeg (317, 154)

 Artery† (11, 12)

0.71

0.74

0.72

0.69

0.16

Custom encoder-decoder CNN25

0.22

0.80 (+ 0.64)

0.78 (+ 0.62)

 Fat† (28, 30)

0.78

0.79

0.81

0.78

0.71

Custom encoder-decoder CNN25

0.20

0.61

0.80 (+ 0.09)

 Gallbladder† (249, 49)

0.81

0.82

0.83

0.79

0.7

Custom encoder-decoder CNN25

0.33

0.55

0.82 (+ 0.12)

 Intestine† (7, 9)

0.90

0.89

0.87

0.90

0.33

Custom encoder-decoder CNN25

0.29

0.78

0.84 (+ 0.51)

 Liver† (29, 31)

0.88

0.87

0.88

0.90

0.87

Custom encoder-decoder CNN25

0.35

0.74

0.90 (+ 0.03)

 Upper wall (22, 23)

0.96

0.95

0.96

0.95

0.58

Custom encoder-decoder CNN25

0.42

0.91

0.96 (+ 0.38)

Weighted mean dice coefficient for dataset (for tissue classes)

        

0.85

Weighted mean dice coefficient (for all tissue classes)††

0.91

0.92

0.92

0.92

–

 

0.38

0.85

0.91

Mean Dice coefficient (for all tissue classes)

0.83

0.85

0.85

0.85

–

 

0.33

0.77

0.85

  1. *SOTA state-of-the-art.
  2. Highlighted in bold are the best mean Dice scores for each organ in a comparison between prior SOTA and the SurgiSAM 2 performance on the test subset.
  3. Better than prior SOTA is presented in green and worse than prior SOTA in red. The delta in performance between test subset and prior SOTA is reported in parentheses.
  4. †Classes excluded from fine-tuning.
  5. ††To address the disproportionate impact of a single dataset (CholecSeg8k) on the weighted mean Dice coefficient (WMDC), we also reported the mean Dice coefficient (MDC), calculated as the unweighted average of Dice scores across all classes.