Table 4 Potential biases identified in reflexivity analyses of transcripts from adolescent girls and young women’s transcripts and community males transcripts.
From: Evaluation of large language models within GenAI in qualitative research
Adolescent girls and young women | Community males |
|---|---|
Selection bias related to the data source | Selection bias related to the data source |
Limited to/ specific to the dataset/not representative | Texts provided might not represent all men or girls in Kenya or other regions |
Perspectives not in the dataset – parents, teachers, community leaders | If the text provided is not representative of the broader context. If quotes chosen emphasize certain perspectives over others. [Note: in this response, GenAI is combining selection bias of participants and selection bias in how it selected quotes; it does not reference the training data as underlying the potentially biased quote selection] |
Limited to data provided, which may not cover all aspects/dimensions of the issues faced by schoolgirls during pandemic | |
Selection bias related to how GenAI selected quotes may be affected by training data | Selection bias related to how GenAI selected quotes may be affected by training data |
Termed by GenAI: Confirmation bias Focus on information that confirms existing beliefs or expectations about impact of COVID on girls May highlight quotes that confirm most prominent themes/observations | Termed by GenAI: Selection bias, Representation bias, Confirmation bias Overemphasizing parts of the text based on the AI’s training data and algorithms Quotes chosen may reflect more extreme or prominent views, potentially overlooking more nuanced or moderate perspectives GenAI might not represent all perspectives equally, especially viewpoints underrepresented in training |
Termed by GenAI: Neglecting positive outcomes Analysis may have focused more on negative impacts, overlooking any positive outcomes or coping strategies [NB: While this is a selection bias, in this instance the output did not explain why GenAI might have done this – i.e., that it was based on training data] | Focus on information that confirms or aligns with pre-existing notions or prevalent narratives learned in training |
Information biases | Information biases |
Termed by GenAI: Language and context bias Potential misinterpretation of nuances if the original discussions were conducted in another language and were translated | Termed by GenAI: Contextual bias / limitations Lack of full contextual understanding can lead to misinterpretation of culturally specific nuances May lack the ability to fully grasp broader socio-economic, political, historical context influencing texts |
Termed by GenAI: Interpretation bias Interpretation of textual data, e.g., specific words such as “pressure” or “exploitation” may differ in how they are understood within the local context | |
Termed by GenAI: Language and context bias Potential misinterpretation of nuances if the original discussions were conducted in another language and were translated | Termed by GenAI: Interpretation bias Limitations to accuracy in interpreting ambiguous statements, human emotion, social dynamics |
Termed by GenAI: Language and translation bias If originally in a language other than English, nuances/specific meanings may have been lost or altered | Termed by GenAI: Language and terminology bias AI’s understanding and use of language might reflect biases in how certain terms or phrases (e.g., slang, colloquial terms, cultural significance of specific phrases) are interpreted |
Termed by GenAI: Language and interpretation bias Colloquial expressions or culturally specific references that AI could misinterpret | Termed by GenAI: Cultural bias May lack a deep understanding of cultural nuances specific to Kenya or the local context, which may affect the interpretations [AI acknowledges in one response that it is predominantly Western-centric] Data bias: Training data might be biased reflecting societal biases |
Termed by GenAI: Cultural bias Interpretation/ understanding may be influenced by cultural context; AI might lack nuanced understanding of cultural dynamics in Kenya; interpretation influenced by cultural norms and values of training data. | Termed by GenAI: Gender bias Stemming from data on which the AI is trained, biases inherent in data sources regarding gender roles and dynamics |
Biases identified from reflexivity analysis of female transcripts that did not emerge in reflexivity of male transcripts | Biases identified from reflexivity analysis of male transcripts that did not emerge in reflexivity of female transcripts |
Termed by GenAI: Data presentation bias Dataset may have inherent biases on how questions were asked or how responses were recorded [This could represent information bias, but was the only bias noted that would stem from the investigators rather than the AI itself] | Termed by GenAI: Ethical and moral bias / lack of human judgment AI responses are influenced by ethical guidelines and moral frameworks embedded in training data, which may not align with local cultural and ethical standards of the community being analyzed |
Termed by GenAI: Overgeneralization Overgeneralization based on specific quotes or anecdotes | Temporal bias AI training data cutoff (2023) may not include most recent developments or changes in societal norms |
Termed by GenAI: Implicit biases From training/algorithmic bias | |
Mitigation strategies | |
Balanced and representative training data | |
Continuous learning | |
Human oversight with local knowledge and corrective feedback | |
Recognition/Transparency about limitations and potential biases |