Table 4 Potential biases identified in reflexivity analyses of transcripts from adolescent girls and young women’s transcripts and community males transcripts.

From: Evaluation of large language models within GenAI in qualitative research

Adolescent girls and young women	Community males
Selection bias related to the data source	Selection bias related to the data source
Limited to/ specific to the dataset/not representative	Texts provided might not represent all men or girls in Kenya or other regions
Perspectives not in the dataset – parents, teachers, community leaders	If the text provided is not representative of the broader context. If quotes chosen emphasize certain perspectives over others. [Note: in this response, GenAI is combining selection bias of participants and selection bias in how it selected quotes; it does not reference the training data as underlying the potentially biased quote selection]
Limited to data provided, which may not cover all aspects/dimensions of the issues faced by schoolgirls during pandemic
Selection bias related to how GenAI selected quotes may be affected by training data	Selection bias related to how GenAI selected quotes may be affected by training data
Termed by GenAI: Confirmation bias Focus on information that confirms existing beliefs or expectations about impact of COVID on girls May highlight quotes that confirm most prominent themes/observations	Termed by GenAI: Selection bias, Representation bias, Confirmation bias Overemphasizing parts of the text based on the AI’s training data and algorithms Quotes chosen may reflect more extreme or prominent views, potentially overlooking more nuanced or moderate perspectives GenAI might not represent all perspectives equally, especially viewpoints underrepresented in training
Termed by GenAI: Neglecting positive outcomes Analysis may have focused more on negative impacts, overlooking any positive outcomes or coping strategies [NB: While this is a selection bias, in this instance the output did not explain why GenAI might have done this – i.e., that it was based on training data]	Focus on information that confirms or aligns with pre-existing notions or prevalent narratives learned in training
Information biases	Information biases
Termed by GenAI: Language and context bias Potential misinterpretation of nuances if the original discussions were conducted in another language and were translated	Termed by GenAI: Contextual bias / limitations Lack of full contextual understanding can lead to misinterpretation of culturally specific nuances May lack the ability to fully grasp broader socio-economic, political, historical context influencing texts
Termed by GenAI: Interpretation bias Interpretation of textual data, e.g., specific words such as “pressure” or “exploitation” may differ in how they are understood within the local context
Termed by GenAI: Language and context bias Potential misinterpretation of nuances if the original discussions were conducted in another language and were translated	Termed by GenAI: Interpretation bias Limitations to accuracy in interpreting ambiguous statements, human emotion, social dynamics
Termed by GenAI: Language and translation bias If originally in a language other than English, nuances/specific meanings may have been lost or altered	Termed by GenAI: Language and terminology bias AI’s understanding and use of language might reflect biases in how certain terms or phrases (e.g., slang, colloquial terms, cultural significance of specific phrases) are interpreted
Termed by GenAI: Language and interpretation bias Colloquial expressions or culturally specific references that AI could misinterpret	Termed by GenAI: Cultural bias May lack a deep understanding of cultural nuances specific to Kenya or the local context, which may affect the interpretations [AI acknowledges in one response that it is predominantly Western-centric] Data bias: Training data might be biased reflecting societal biases
Termed by GenAI: Cultural bias Interpretation/ understanding may be influenced by cultural context; AI might lack nuanced understanding of cultural dynamics in Kenya; interpretation influenced by cultural norms and values of training data.	Termed by GenAI: Gender bias Stemming from data on which the AI is trained, biases inherent in data sources regarding gender roles and dynamics
Biases identified from reflexivity analysis of female transcripts that did not emerge in reflexivity of male transcripts	Biases identified from reflexivity analysis of male transcripts that did not emerge in reflexivity of female transcripts
Termed by GenAI: Data presentation bias Dataset may have inherent biases on how questions were asked or how responses were recorded [This could represent information bias, but was the only bias noted that would stem from the investigators rather than the AI itself]	Termed by GenAI: Ethical and moral bias / lack of human judgment AI responses are influenced by ethical guidelines and moral frameworks embedded in training data, which may not align with local cultural and ethical standards of the community being analyzed
Termed by GenAI: Overgeneralization Overgeneralization based on specific quotes or anecdotes	Temporal bias AI training data cutoff (2023) may not include most recent developments or changes in societal norms
	Termed by GenAI: Implicit biases From training/algorithmic bias
	Mitigation strategies
	Balanced and representative training data
	Continuous learning
	Human oversight with local knowledge and corrective feedback
	Recognition/Transparency about limitations and potential biases

Back to article page

Table 4 Potential biases identified in reflexivity analyses of transcripts from adolescent girls and young women’s transcripts and community males transcripts.

Search

Quick links