Table 2 Comparison of data category distribution between Draper dataset and undersampled Draper subset.

From: A lightweight transformer based multi task learning model with dynamic weight allocation for improved vulnerability prediction

DatasetType

Original dataset

Undersampling dataset

CWE-119

24157 (1.9%)

24157 (12.0%)

CWE-120

47660 (3.7%)

47660 (23.6%)

CWE-469

2625 (0.2%)

2625 (1.3%)

CWE-476

12094 (0.9%)

12094 (6.0%)

CWE-other

35028 (2.7%)

35028 (17.3%)

CWE-none

1191955 (93.5%)

119196 (59.1%)

Total

1274366

201607