Table 1 Statistics comparison of existing datasets and our Broncho-R dataset, including the dataset name, dataset source, number of samples, and multiple sub-task involvement.

Dataset Name	Data Source	Patient Numbers	Images	Sub-Task
BronchoLC¹¹	Public	208	2,921	CLS, SEG
UAAL¹²	Public	—	3,814	CLS, SEG
B12K¹⁴	Public	615	2,900	CLS
PKDN¹³	Private	200	2,029	CLS
Ours	Public	3,692	6,330	CAP, CLS

In general, the main shortcomings of existing datasets include the following: (i) there is a lack of comprehensive and evenly distributed departmental coverage, as well as a sufficient number of patients, to prevent bias; (ii) data sources are often private and inaccessible; (iii) existing datasets usually only have single-task annotation. “CLS” refers to classification task, “SEG” refers to segmentation task, and “CAP” refers to caption task.

Quick links

Search