Extended Data Table 1 Overview of human-centric computer vision (HCCV) datasets commonly used for fairness

From: Fair human-centric image dataset for ethical AI benchmarking

This table compares the properties of 27 HCCV datasets frequently used for evaluating bias in computer vision models. Features include dataset size, collection method, availability of annotations (Bounding Boxes [BB], Key Points [KP], Segmentation Masks [SM]), consent details, terms of use, and demographic diversity attributes. The abbreviations used are defined as follows: BB (a: automatic, m: manual, F: face, O: object, P: person), KP/SM (a: automatic, m: manual, v: manually verified, with the integer value denoting the number of key points or landmarks, or segmentation categories), Consent (no details: consent obtained, but no details provided; details: consent details provided, but no explicit mention of AI; details, for AI: consent details provided, including data processing for AI fairness purposes), and Terms of Use (n-c: non-commercial, research: research only, eval.: evaluation only, edu.: educational use, revoked: authors no longer make dataset available). Attributes marked with * are self-reported. (-) denotes where the relevant information was not available. MS-Celeb-1M¹²⁷, YFCC100M¹⁴⁹, Megaface¹⁵⁰, VGGFace¹⁵¹, Diversity in Faces (DiF)¹⁵², Pilot Parl. Benchmark⁹, FRGC¹⁵³, RWF¹⁵⁴, Morph¹⁵⁵, Adience¹⁵⁶, BUPT-Globalface¹⁵⁷, WIDERFACE-DEMO¹⁵⁸, KANFace¹⁵⁹, FairFace¹⁶⁰, ImageNet (ILSVRC)¹⁶¹, CelebA⁸⁶, LFWA¹⁶², MTFL¹⁶³, UTKFace¹⁶⁴, MIAP⁴², FACET²⁴, MS-COCO⁴⁰, VQA 2.0⁴¹, Casual Conversations²⁵, CCV2²⁶, Chicago Face Database²⁷, Dollar Street⁴³.

Back to article page

Quick links

Search

Quick links