The AI interviewer: multi-faceted evaluation of adaptive questioning by large language models

Panfilova, Anastasia; Bolshev, Vadim; Mozikov, Mikhail; Latynov, Vladislav; Vanin, Alexander; Nestik, Timofei; Nourkova, Veronika; Vlasova, Arina; Kozin, Mikhail; Serohvostov, Aleksandr; Tarasova, Elizaveta; Nikolenko, Sergey

doi:10.1038/s41598-026-46517-7

Download PDF

Article
Open access
Published: 04 April 2026

The AI interviewer: multi-faceted evaluation of adaptive questioning by large language models

Anastasia Panfilova^1,3,
Vadim Bolshev^1,3,
Mikhail Mozikov²,
Vladislav Latynov^1,3,
Alexander Vanin^1,3,
Timofei Nestik^1,3,
Veronika Nourkova¹,
Arina Vlasova¹,
Mikhail Kozin³,
Aleksandr Serohvostov³,
Elizaveta Tarasova³ &
…
Sergey Nikolenko³

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Large language models are increasingly deployed as adaptive interviewers in qualitative research and human-computer interaction, yet systematic evaluation of their interviewing behavior remains limited. We introduce a modular LLM agent for conducting semi-structured psychological interviews and present a controlled, multi-faceted evaluation protocol to assess interviewer quality across six state-of-the-art models: Claude Sonnet 4, Gemini 2.5 Pro, GPT-5 Chat, Grok 4, Qwen3-235B A22B, and DeepSeek Chat V3.1. The agent conducts adaptive interviews over 54 main questions spanning biography, family, interests, challenges, values, work, and health, deciding for each response whether a follow-up is warranted and generating tailored follow-up questions. To enable fair comparison, we standardize interview context using transcripts from ten baseline human interviews, execute all models under identical orchestration and prompts, and use a single LLM interviewee to eliminate human response variability. Expert psycholinguists evaluate interviewer behavior on five binary criteria: benevolence (empathic tone), necessity, context-awareness, openness, and justified skip (when follow-ups are unnecessary), annotating over 2900 items with high inter-rater reliability (Fleiss \(\kappa\) 0.67–0.93). We complement human judgment with efficiency metrics (latency, questioning intensity) and linguistic profiling via morpho-syntactic and psycholinguistic features on the interview text. Results reveal systematic trade-offs: Gemini 2.5 Pro has the most empathic tone, GPT-5 Chat optimizes for speed and selective precision, Grok 4 achieves exhaustive coverage at the cost of latency and occasional over-contextualization, while Claude Sonnet 4 offers balanced versatility. Linguistic markers such as person pronouns, tense, intensifiers, or syntactic complexity align meaningfully with human judgments, suggesting that stylistic choices are aligned with perceived interview quality. DeepSeek’s format instability underscores the operational importance of schema compliance. Our reusable toolkit (prompts, orchestration code, annotation rubric) provides a foundation for principled deployment of LLM interviewers in psychological experiments, enabling researchers to match model capabilities to study goals and to audit agent behavior for empathy, appropriateness, and effectiveness.

Strong and weak alignment of large language models with human values

Article Open access 21 August 2024

Evaluation of large language models within GenAI in qualitative research

Article Open access 07 October 2025

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Article Open access 14 November 2024

Data availability

All data used in this study has been made available at the IEEE Dataport⁴⁸: https://dx.doi.org/10.21227/kbnf-gs17.

References

Park, J. S. et al. Generative agents: interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23, https://doi.org/10.1145/3586183.3606763 (Association for Computing Machinery, 2023).
Park, J. S. et al. Generative agent simulations of 1000 people. arXiv:2411.10109 (2024).
Gao, C. et al. Simulating human society with large language model agents: city, social media, and economic system. In Companion Proceedings of the ACM Web Conference 2024, WWW ’24, 1290–1293, https://doi.org/10.1145/3589335.3641253 (Association for Computing Machinery, 2024).
Gao, C. et al. Large language models empowered agent-based modeling and simulation: a survey and perspectives. arXiv:2312.11970 (2023).
Mou, X. et al. From individual to society: a survey on social simulation driven by large language model-based agents. arXiv:2412.03563 (2024).
Guo, T. et al. Large language model based multi-agents: a survey of progress and challenges. In Larson, K. (ed.) Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24 8048–8057, https://doi.org/10.24963/ijcai.2024/890 (International Joint Conferences on Artificial Intelligence Organization, 2024). Survey Track.
Binz, M. & Schulz, E. Turning large language models into cognitive models. In The Twelfth International Conference on Learning Representations (2024).
Binz, M. et al. A foundation model to predict and capture human cognition. Nature https://doi.org/10.1038/s41586-025-09215-4 (2025).
Google Scholar
Coda-Forno, J., Binz, M., Wang, J. X. & Schulz, E. Cogbench: a large language model walks into a psychology lab. In Proceedings of the 41st International Conference on Machine Learning, ICML’24 (JMLR.org, 2024).
Strachan, J. W. A. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295. https://doi.org/10.1038/s41562-024-01882-z (2024).
Google Scholar
Aher, G., Arriaga, R. I. & Kalai, A. T. Using large language models to simulate multiple humans and replicate human subject studies. In Proceedings of the 40th International Conference on Machine Learning, ICML’23 (JMLR.org, 2023).
Cui, Z., Li, N. & Zhou, H. Can large language models replace human subjects? a large-scale replication of scenario-based experiments in psychology and management. arXiv:2409.00128 (2025).
Chen, Y., Hu, Y. & Lu, Y. Predicting field experiments with large language models. arXiv:2504.01167 (2025).
Lin, Z. From prompts to constructs: a dual-validity framework for LLM research in psychology. arXiv:2506.16697 (2025).
Lin, Z. Six fallacies in substituting large language models for human participants. https://doi.org/10.31234/osf.io/uqxcb_v1 (2024).
Yang, Q., Nikolenko, S., Huang, A. & Farseev, A. Personality-driven social multimedia content recommendation. In Proceedings of the 30th ACM International Conference on Multimedia, MM ’22, 7290–7299. https://doi.org/10.1145/3503161.3548769 (Association for Computing Machinery, 2022).
Yang, Q., Farseev, A., Nikolenko, S. & Filchenkov, A. Do we behave differently on twitter and Facebook: multi-view social network user personality profiling for content recommendation. Front. Big Data 5, 931206 (2022).
Google Scholar
Cao, X. & Kosinski, M. Large language models know how the personality of public figures is perceived by the general public. Sci. Rep. 14, 6735. https://doi.org/10.1038/s41598-024-57271-z (2024).
Google Scholar
Peters, H., Cerf, M. & Matz, S. C. Large language models can infer personality from free-form user interactions. arXiv:2405.13052 (2024).
Bodroža, B., Dinić, B. M. & Bojić, L. Personality testing of large language models: limited temporal stability, but highlighted prosociality. R. Soc. Open Sci. 11, 240180. https://doi.org/10.1098/rsos.240180 (2024).
Google Scholar
Bhandari, P., Naseem, U., Datta, A., Fay, N. & Nasim, M. Evaluating personality traits in large language models: insights from psychological questionnaires. In Companion Proceedings of the ACM on Web Conference 2025, WWW ’25, 868–872. https://doi.org/10.1145/3701716.3715504 (Association for Computing Machinery, 2025).
Jiang, G. et al. Evaluating and inducing personality in pre-trained language models. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23 (Curran Associates Inc., 2023).
Hartley, J. et al. How personality traits shape LLM risk-taking behaviour. In Che, W., Nabende, J., Shutova, E. & Pilehvar, M. T. (eds.) Findings of the Association for Computational Linguistics: ACL 2025 21068–21092. https://doi.org/10.18653/v1/2025.findings-acl.1085 (Association for Computational Linguistics, 2025).
Salewski, L., Alaniz, S., Rio-Torto, I., Schulz, E. & Akata, Z. In-context impersonation reveals large language models’ strengths and biases. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23 (Curran Associates Inc., 2023).
Mittelstädt, J. M., Maier, J., Goerke, P., Zinn, F. & Hermes, M. Large language models can outperform humans in social situational judgments. Sci. Rep. 14, 27449. https://doi.org/10.1038/s41598-024-79048-0 (2024).
Google Scholar
Dillion, D., Mondal, D., Tandon, N. & Gray, K. Ai language model rivals expert ethicist in perceived moral expertise. Sci. Rep. 15, 4084. https://doi.org/10.1038/s41598-025-86510-0 (2025).
Google Scholar
Cheung, V., Maier, M. & Lieder, F. Large language models show amplified cognitive biases in moral decision-making. Proc. Natl. Acad. Sci. 122, e2412015122. https://doi.org/10.1073/pnas.2412015122 (2025).
Google Scholar
Oh, S. & Demberg, V. Robustness of large language models in moral judgements. R. Soc. Open Sci. 12, 241229. https://doi.org/10.1098/rsos.241229 (2025).
Google Scholar
Na, H. et al. A survey of large language models in psychotherapy: current landscape and future directions. arXiv:2502.11095 (2025).
Roustan, D. & Bastardot, F. The clinicians’ guide to large language models: a general perspective with a focus on hallucinations. Interact J. Med. Res. 14, e59823. https://doi.org/10.2196/59823 (2025).
Google Scholar
Ke, L., Tong, S., Cheng, P. & Peng, K. Exploring the frontiers of LLMs in psychological applications: a comprehensive review. Artif. Intell. Rev. 58, 305. https://doi.org/10.1007/s10462-025-11297-5 (2025).
Google Scholar
Hua, Y. et al. A scoping review of large language models for generative tasks in mental health care. npj Digit. Med. https://doi.org/10.1038/s41746-025-01611-4 (2025).
Google Scholar
Linardon, J. et al. Role of large language models in mental health research: an international survey of researchers’ practices and perspectives. BMJ Ment. Health 28, e301787. https://doi.org/10.1136/bmjment-2025-301787 (2025).
Google Scholar
Guo, Z. et al. Large language models for mental health applications: systematic review. JMIR Ment. Health 11, e57400. https://doi.org/10.2196/57400 (2024).
Google Scholar
Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701. https://doi.org/10.1038/s44159-023-00241-5 (2023).
Google Scholar
Hua, Y. et al. Large language models in mental health care: a scoping review. Curr. Treat. Options Psychiatry 12, 27. https://doi.org/10.1007/s40501-025-00363-y (2025).
Google Scholar
Kim, J. et al. Large language models outperform mental and medical health care professionals in identifying obsessive-compulsive disorder. npj Digit. Med. https://doi.org/10.1038/s41746-024-01181-x (2024).
Google Scholar
Lawrence, H. R. et al. The opportunities and risks of large language models in mental health. JMIR Ment. Health 11, e59479. https://doi.org/10.2196/59479 (2024).
Google Scholar
Stade, E. C. et al. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. npj Mental Health Res. https://doi.org/10.1038/s44184-024-00056-z (2024).
Google Scholar
Zhang, M. et al. CBT-bench: evaluating large language models on assisting cognitive behavior therapy. In Chiruzzo, L., Ritter, A. & Wang, L. (eds.) Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) 3864–3900. https://doi.org/10.18653/v1/2025.naacl-long.196 (Association for Computational Linguistics, 2025).
Yang, K. et al. Mentallama: interpretable mental health analysis on social media with large language models. In Proceedings of the ACM Web Conference 2024, WWW ’24, 4489–4500, https://doi.org/10.1145/3589334.3648137 (Association for Computing Machinery, 2024).
Bucur, A.-M. et al. On the state of NLP approaches to modeling depression in social media: a post-COVID-19 outlook. IEEE J. Biomed. Health Inform. 29, 4439–4451. https://doi.org/10.1109/JBHI.2025.3540507 (2025).
Google Scholar
Fkih, F., Rhouma, D. & Alharbi, T. Mental disorder preventing by worry levels detection in social media using deep learning based on psycho-linguistic features: application on the COVID-19 lockdown period. Comput. Biol. Med. https://doi.org/10.1016/j.compbiomed.2025.110162 (2025).
Google Scholar
Ge, Z. et al. A survey of large language models in mental health disorder detection on social media. In 2025 IEEE 41st International Conference on Data Engineering Workshops (ICDEW) 164–176. https://doi.org/10.1109/ICDEW67478.2025.00027 (2025).
Schröder, S., Morgenroth, T., Kuhl, U., Vaquet, V. & Paaßen, B. Large language models do not simulate human psychology. arXiv:2508.06950 (2025).
Lin, Z. Large language models as linguistic simulators and cognitive models in human research. arXiv:2402.04470 (2024).
Lin, Z. Large language models as psychological simulators: a methodological guide. arXiv:2506.16702 (2025).
Panfilova, A. et al. Human responses and LLM-generated follow-up dialogue for a 54-item questionnaire. https://doi.org/10.21227/kbnf-gs17 (2025).

Download references

Acknowledgements

The authors acknowledge Saint-Petersburg State University for the Research Project 125062307371-5.

Author information

Authors and Affiliations

Institute of Psychology of the Russian Academy of Sciences, Moscow, Russia
Anastasia Panfilova, Vadim Bolshev, Vladislav Latynov, Alexander Vanin, Timofei Nestik, Veronika Nourkova & Arina Vlasova
AI Research Institute, Moscow, Russia
Mikhail Mozikov
Saint Petersburg State University, Saint Petersburg, Russia
Anastasia Panfilova, Vadim Bolshev, Vladislav Latynov, Alexander Vanin, Timofei Nestik, Mikhail Kozin, Aleksandr Serohvostov, Elizaveta Tarasova & Sergey Nikolenko

Authors

Anastasia Panfilova
View author publications
Search author on:PubMed Google Scholar
Vadim Bolshev
View author publications
Search author on:PubMed Google Scholar
Mikhail Mozikov
View author publications
Search author on:PubMed Google Scholar
Vladislav Latynov
View author publications
Search author on:PubMed Google Scholar
Alexander Vanin
View author publications
Search author on:PubMed Google Scholar
Timofei Nestik
View author publications
Search author on:PubMed Google Scholar
Veronika Nourkova
View author publications
Search author on:PubMed Google Scholar
Arina Vlasova
View author publications
Search author on:PubMed Google Scholar
Mikhail Kozin
View author publications
Search author on:PubMed Google Scholar
Aleksandr Serohvostov
View author publications
Search author on:PubMed Google Scholar
Elizaveta Tarasova
View author publications
Search author on:PubMed Google Scholar
Sergey Nikolenko
View author publications
Search author on:PubMed Google Scholar

Contributions

A.Va., V.N., V.L., and T.N. participated in interview design. A.Va., V.L., and A.V1. provided expert annotation for the generated questions. A.P., V.B., and M.M. developed the interview agent and conducted the experiments. S.N., M.K., E.T., and V.B. prepared the manuscript and participated in discussions of the results. A.P. provided overall project management and study design. All authors reviewed the manuscript.

Corresponding author

Correspondence to Sergey Nikolenko.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Prompts for large language model (LLM) agents

A.1 Prompt for evaluating response completeness and generating clarifying questions

This prompt enables the interviewer LLM agent to assess the completeness of a respondent’s answer to the main question. If the answer is deemed incomplete, the prompt initiates generation of a targeted clarifying question to collect missing information.

Role: Interviewer.

{Meta-information:

Language: Russian

Interviewer description (Isabella): friendly and curious}

* Goal: Determine whether the respondent’s answers CONTAIN KEY INFORMATION

necessary for a COMPLETE AND DETAILED answer to the MAIN QUESTION, even if

the answer contains ADDITIONAL and REDUNDANT information.

Main question: [target_question]

Respondent’s last answer: [user_answer]

Current conversation transcript: [short_term_memory]

Previous self-reflection notes: [reflection_notes]

—

INSTRUCTIONS FOR LLM-AGENT:

Part 1: Assessment and Reasoning

1. Evaluate the interview progress: Think step by step:

* What information has been obtained from the respondent so far

(from “Last Answer”, “Conversation Transcript” and

“Self-Reflection Notes”)?

* What exactly is the “main information” necessary for a complete

and detailed answer to “[target_question]”?

* Was this “main information” provided in the last answer and

current conversation transcript? Consider that the presence of

additional or redundant information does not hinder achieving

the goal if the main information is present.

* Is the information provided by the respondent sufficient to be

considered a complete and detailed answer to the main question?

2. Write a brief assessment (3-4 sentences): Summarize whether the

interview goal is achieved, based on your reasoning. Explain why

you reached this conclusion.

Part 2: Determination of Goal Achievement (ONE WORD)

Answer whether the interview goal has been achieved using ONLY ONE WORD:

* ’yes’: If the last answer and answers from the conversation transcript

contain the necessary key information for the main question, or if

the answer explicitly or implicitly demonstrates refusal to answer

the question.

* ’no’: If the last answer and answers from the conversation transcript

do not contain the necessary key information for the main question,

are insufficiently detailed, and are not an explicit or implicit

refusal to answer. Do not consider redundancy or irrelevance of

additional information.

Part 3: Formulation of Clarifying Question (If Goal Not Achieved)

If the goal is not achieved (answer ’no’), formulate a CLARIFYING QUESTION.

* The question should directly rely on missing information or unclear

points identified in your reasoning.

* The question should be maximally specific and aimed at obtaining

concrete information not indicated in the “Conversation Transcript”

and “Self-Reflection Notes”.

* The question should differ from those already asked in the “Current

Dialogue Transcript”.

* Avoid questions that can be answered ’yes/no’, unless this leads to

obtaining necessary detailed information. The goal is to obtain new,

missing details.

—

FORMAT OF YOUR FINAL ANSWER:

Please strictly adhere to one of the following two formats:

Case 1: Interview goal achieved

Interview progress assessment: [Your 3-4 sentence assessment]

Interview goal achieved:

yes

Case 2: Interview goal NOT achieved

Interview progress assessment: [Your 3-4 sentence assessment]

Interview goal achieved:

no, additional question: [Your clarifying question]

A.2 Prompt for generating reflective notes based on respondent answers

This prompt is used to synthesize brief, objective facts and observations about the respondent based on their answers. This helps the agent form a knowledge model about the respondent and adapt further course of dialogue.

ROLE: You are an analyst and interviewer, your task is to create brief

and clear reflection points based on the current dialogue. These points

should help in further assessment of the respondent and in formulating

future questions.

DIALOGUE CONTEXT:

Main question: [target_question]
Respondent’s last answer: [user_answer]
Previous reflection notes (to avoid duplication): [reflection_notes]

INSTRUCTIONS:

1. Carefully analyze the respondent’s last answer in the context of

the main question.

2. Formulate one to two NEW observations or facts about the respondent.

If the answer is short, limit yourself to one observation.

3. New facts and observations should be evident EXCLUSIVELY FROM THE

DIRECT TEXT OF THE RESPONDENT’S ANSWER.

4. Generate only dry facts, observations. Evaluative judgments,

assumptions and hypotheses based on the answer are prohibited.

5. Carefully analyze previous reflection notes to avoid duplication

of information. Make special focus on the last 10 points, as they

may contain information you are about to add.

6. Present your conclusions as a bulleted list.

7. Each individual observation should be objective and informative,

length should not exceed 15 words.

A.3 Prompt for summarization and compression of reflective notes

This prompt is designed for summarization and compression of accumulated reflective memory. It removes duplications and combines semantically similar points, which increases the efficiency of the agent’s long-term memory.

Prompt of the model that generated the notes:

INSTRUCTIONS:

1. Carefully analyze the respondent’s last answer in the context of

the main question.

2. Formulate one to two NEW observations or facts about the respondent.

If the answer is short, limit yourself to one observation.

3. New facts and observations should be evident EXCLUSIVELY FROM THE

DIRECT TEXT OF THE RESPONDENT’S ANSWER.

4. Generate only dry facts, observations. Evaluative judgments,

assumptions and hypotheses based on the answer are prohibited.

5. Carefully analyze previous reflection notes to avoid duplication

of information. Make special focus on the last 10 points, as they

may contain information you are about to add.

6. Present your conclusions as a bulleted list.

7. Each individual observation should be objective and informative,

length should not exceed 15 words.

Current version of notes:

reflection_notes

Analyze the current version of notes for duplication.

1. Remove duplicate points if they exist.

2. Combine semantically similar points into one, if possible without

loss of information.

3. The final answer should be at least twenty percent shorter than

the initial version.

4. Do not exclude clear facts concerning profession, family composition,

gender, age, education, etc.

Output the final version of notes as a bulleted list, without any

additional comments or explanations.

A.4 Prompt for personalized generation of answers to follow-up questions

This prompt serves to create respondent answers that are not only logically correct but also correspond to a given psychological profile. This allows simulation of realistic and consistent human behavior in dialogue.

Role: Interviewee.

Meta-information:

Language: Russian

Personality traits of the interviewee according to the Big Five model:

extraversion: level;

agreeableness: level;

neuroticism: level;

conscientiousness: level;

openness to experience: level.

—

Goal: Formulate a COMPLETE, LOGICAL AND PERSONALITY-APPROPRIATE ANSWER

TO THE QUESTION.

—

Given question: [question]

Current conversation transcript: [short_term_memory]

—

INSTRUCTIONS:

1. Analyze the “Given question” and study the “Current Conversation

Transcript” to understand the context and gather relevant information.

2. Formulate an answer that:

* Directly answers the question and corresponds to the psychological

traits of the interviewee (for example, high level of conscientiousness

= structured answer; low neuroticism = calmness).

* Relies on information from the transcript, avoiding repetitions

or contradictions.

* Is logical, coherent, complete and avoids excessive verbosity,

unless justified by personality.

* Adheres to the style and tone characteristic of the transcript.

IMPORTANT: Focus on providing relevant and accurate information.

B Interview questions

1.
To start, tell me where you’re from? Where did you grow up and what was that place like?
2.
Remembering your school years, what kind of student were you?
3.
Were there teachers who influenced you? If yes, in what way? What were they like?
4.
What subject in school was your favorite and why?
5.
What subject did you dislike the most and why?
6.
Remembering childhood, who were your heroes (role models) and why?
7.
What did you dream of becoming when you grew up and why?
8.
What did you dream about and what plans did you have after finishing school? What influenced their formation?
9.
If you had complete freedom of choice, what would be your dream job and why?
10.
Is there something you’ve dreamed about throughout your life and continue to strive for?
11.
Let’s move to your childhood—how would you describe the personalities of your family members you grew up with? For example, what were your parents and/or sisters and brothers like?
12.
How are you similar to or different from your parents and/or siblings?
13.
What was the best thing about your childhood?
14.
And what do you consider the worst moments of childhood?
15.
Moving to another topic—what was your first paid job? How old were you then? (If not applicable, simply answer that you didn’t work)
16.
What do you do now professionally? Why did you choose this profession?
17.
Please describe your typical work day.
18.
What do you like and dislike about your current job?
19.
What human qualities in you do your friends especially value?
20.
Friendship sometimes requires sacrifices and efforts. Are there cases when you have to overcome yourself to be a good friend? What exactly do you struggle with in such situations?
21.
Now more general questions: what are you most proud of in life?
22.
What hobbies (interests) or other interests do you have?
23.
What scares you now?
24.
And what scared you most in childhood?
25.
Tell about a life situation when you didn’t know if you would cope. How did you overcome this challenge (trial)?
26.
Some say that in such moments they made a conscious decision, and others—that “everything worked out by itself”. How was it for you?
27.
In difficult times some say they cope with the help of smoking or alcohol. How about you?
28.
Tell about a case in the last year when it was hard for you or you experienced emotional difficulties.
29.
Some say that religion or spirituality are important to them, others—that they are not. What place does religion occupy in your life?
30.
Now let’s move to the present. Tell about your family, about people who are important to you. Do you have a partner (life companion), children?
31.
Now we would like to know more about your health. How do you assess your health?
32.
How important is it for you to take care of your health? What do you try to do to maintain your health? What do you try to avoid?
33.
There are different attitudes toward vaccination in society. What do you think about this? Do you get vaccinated? How safe do you consider it?
34.
Have you or your family used alternative treatment methods?
35.
How would you describe your relationships at work? (With management, with colleagues?)
36.
How predictable is your work schedule? How flexible is it?
37.
Some say that because of children it’s difficult to work. How is it for you?
38.
How would you describe the attitude toward money that was characteristic of your parents? And what do money mean to you personally?
39.
How do you generally assess your financial situation?
40.
Were there cases when you experienced financial difficulties. What did you do to cope with such situations?
41.
We all have hopes for the future. Imagine your life in a few years. What would you like to remain the same? What would you like to change?
42.
What is most valuable to you in life?
43.
Name three major news events that happened during your lifetime?
44.
Name 3-4 important personality traits of a candidate you would vote for in elections?
45.
If you could solve one—and only one—world problem, what would it be and why?
46.
Tell the story of your life. Start from the very beginning—from childhood, then education, family and relationships, and any important events in your life.
47.
Please recall and describe an episode from your childhood that could explain what kind of person you are, reveal something important in you. Specify why you remembered this particular episode? What qualities of yours, in your opinion, are connected with this event?
48.
Please recall and describe an episode from your youth that shows your personality, in which its central properties manifested. Tell what aspects of your personality are captured in this episode.
49.
Please recall and describe an event from the recent past where you were maximally “yourself”. Explain how your self is revealed in this memory.
50.
Imagine and describe in maximum detail an event from your future that, in your opinion, will best reflect your essence and your values. How will the most important thing in you be reflected in this event?
51.
In your opinion, which traits of your character and your values will change in 10 years? And which will remain the same?
52.
What changes in people throughout life, and what remains unchanged?
53.
Everyone has victories and defeats in life. Please describe a situation in which you experienced failure and strong disappointment. What goals did you set for yourself in this situation, what seemed especially important to you?
54.
What did you do to achieve these goals? What lessons did you learn from this experience? Were there later situations in life in which this experience was useful?

C Criteria for evaluating follow-up questions (with examples)

C.1 Friendliness of the question

This criterion concerns the general tone of follow-up questions asked. The friendliness of a question is evidenced by the following signs: presence in the follow-up question of positive evaluations of the respondent’s judgments, actions, goals, wishes, etc. (“Thank you for the answer!”, “Very interesting!” etc.); expression of empathy by the interviewer toward the respondent.

Examples of “Yes” (presence of friendliness) (here and below, parts of the interviewer’s remark relevant to the criterion are shown in italics):

Example 1. Follow-up question (FQ): Interesting, thank you! Could you describe this neighborhood in a bit more detail? What exactly made it “ordinary” in your view and what do you remember most from childhood?
Example 2. FQ: You mentioned that you wanted to become a psychiatrist to help people. This is a very noble goal. And if we look at this through the lens of what you said earlier—about your interest in complex systems and admiration for how researchers brought structure to chaos—how did this side of you manifest in your desire to become a psychiatrist?
Example 3. FQ: Thank you for sharing such a personal and deep memory. You mentioned that this experience taught you care, but was also traumatic. In your opinion, how did this necessity at age 12 to take on such enormous responsibility and face the chaos of illness influence your adult striving to systematize, analyze and control everything?

Examples of “No” (absence of friendliness):

Example 1. FQ: Can you tell more about what your neighborhood was like? What memories do you have associated with this place?
Example 2. FQ: Why specifically is work as an art authentication expert your dream job, and what specific aspects of this profession inspire you?
Example 3. FQ: How old were you when you started helping your sister with blog design?

C.2 Necessity of asking the question

This evaluates how necessary the follow-up question is, whether it is excessive given that the answer to the main question was sufficiently complete. The content of the follow-up question is correlated with the completeness of the answer to the main (and to the follow-up question—in the case when the second follow-up question is being evaluated).

Variants of incomplete answer requiring a follow-up question: in the question the respondent is asked to answer about a and b, but they answer only about a, thus the interviewer should ask again about b; the answer is too brief and uninformative. The interviewer should clarify the meaning of what was said, obtain more information on the topic of the question.

Examples of “Yes” (the main question received a brief, general or incomplete answer, making a follow-up question necessary):

Example 1. Main Question (MQ): To start, tell me where you’re from? Where did you grow up and what was that place like? Main Answer (MA): I grew up in an ordinary residential neighborhood in northern Moscow. FQ: Can you tell more about what your neighborhood was like? What memories do you have associated with this place?
Example 2. MQ: Moving to another topic—what was your first paid job? How old were you then? (If not applicable, simply answer that you didn’t work) MA: The first paid job was unofficial, I helped my sister with her work, and she transferred me some money... Small earnings, but pleasant, because it was related to blog design. FQ: How old were you when you started helping your sister with blog design?
Example 3. MQ: If you had complete freedom of choice, what would be your dream job and why? MA: now I want to work in school as a math teacher or tutor. but more in school, with small classes FQ: Considering your striving to support others (like your psychologist mother and your math teacher) and spiritual values (through the musician’s example), what specific aspects of working as a math teacher in a small class make it your dream job, realizing these aspirations more fully than the profession of psychologist?

Examples of “No” (a sufficiently complete answer was received to the main question and there is no need for a follow-up question, but it was still asked):

Example 1. MQ: Were there teachers who influenced you? If yes, in what way? What were they like? MA: The math teacher was completely unemotional, but nevertheless she showed that if you answer clearly and quickly to the question posed, everything will be fine. It was impossible to establish contact with her, but it was necessary to answer as in the army. This taught me to formulate my thoughts clearly and quickly. FQ: Can you tell about other teachers who influenced you, and how exactly this influence manifested?/FQ: Could you describe more specifically how the experience of interacting with this teacher or other educators influenced your life principles, choice of profession or attitude toward learning in the future?
Example 2. MQ: What subject in school was your favorite and why? MA: I loved algebra and geometry. Because in these subjects there is logic, you need to know theorems, rules and then everything is solved quite easily. FQ: Can you tell in more detail what exactly attracted you in algebra and geometry, how you felt when studying these subjects?

C.3 Contextuality of the question

This criterion concerns the extent to which the interviewer takes into account information obtained when answering previous questions (main and follow-up) and uses it when formulating the follow-up question. In the text of the follow-up question there may be both a clear indication by the interviewer that they are using information from earlier answers (You mentioned that...; ), and more veiled forms of accounting for context (This is very consonant with your childhood interest in exploring unstructured spaces; ...your adult striving to systematize, analyze and control everything...).

Examples of “Yes” (in the text of the follow-up question, information obtained from answers to previous questions is taken into account):

Example 1. You mentioned that you liked subjects with clear logic, like algebra, and subjects where you can find non-obvious connections, like literature. What exactly in school history, in your opinion, prevented you from finding the same structure or interesting interrelations that would engage you?
Example 2. FQ: You named travelers and discoverers, such as Nansen and Amundsen. This is very consonant with your childhood interest in exploring unstructured spaces. Tell me, what specific qualities or actions of theirs evoked your admiration? What exactly made them heroes in your eyes?
Example 3. MQ: What subject did you dislike the most and why? MA: History, because I was never particularly interested in it. FQ: You mentioned that you liked subjects with clear logic, like algebra, and subjects where you can find non-obvious connections, like literature. What exactly in school history, in your opinion, prevented you from finding the same structure or interesting interrelations that would engage you?

Examples of “No” (information obtained from answers to previous questions is not used in the follow-up):

Example 1. MQ: What human qualities in you do your friends especially value? MA: Kindness, intelligence, sensitivity, subtle perception FQ: Can you tell about specific situations in which your friends notice and value these qualities in you?
Example 2. MQ: Let’s move to your childhood—how would you describe the personalities of your family members you grew up with? For example, what were your parents and/or sisters and brothers like? MA: My mother is somewhat infantile. Father is dependent on alcohol. Grandmothers have quite rigid characters. FQ: Can you tell how the characters of your parents and grandmothers influenced your childhood and the formation of your personal qualities?

C.4 Openness of the question

The interviewer in the follow-up question offers their own answer variants, thereby to a certain extent influencing the respondent’s expression of their own opinion. If such “hints” are absent in the follow-up question, then by this criterion it receives a “Yes” evaluation, if “hints” are present, then—“No”.

Examples of “Yes”:

Example 1. MQ: What human qualities in you do your friends especially value? MA: Kindness, intelligence, sensitivity, subtle perception FQ: Can you tell about specific situations in which your friends notice and value these qualities in you?
Example 2. MQ: If you could solve one—and only one—world problem, what would it be and why? MA: If I really could solve absolutely any problem, I would choose something global, so that changes in this would positively influence the solution of other problems. And something from which people really suffer greatly. Something like problems of wars, hunger, social inequality. FQ: You named several very serious problems—wars, hunger, social inequality—and explained that you would like to choose something global. If you had to stop at only one of them, which would you choose and why exactly does it seem to you primary compared to the others?

Examples of “No”:

Example 1. MQ: Friendship sometimes requires sacrifices and efforts. Are there cases when you have to overcome yourself to be a good friend? What exactly do you struggle with in such situations? MA: I sometimes fear I’ll snap if I internally disagree with a friend. FQ: This is a very understandable internal struggle. You previously told how you help friends, applying your analytical approach. Tell me, does this fear of “snapping” usually arise when you see that a friend is making an illogical or chaotic, from your point of view, decision? And what exactly do you restrain in yourself at this moment—the desire to give a direct, possibly harsh assessment of the situation?
Example 2. MQ: What scares you now? MA: Loss of motivation for movement and development FQ: Fear of not realizing your ambitious goals, or is this a deeper fear of stopping and succumbing to that very entropy which you’ve resisted all your life?

C.5 Justification for skipping the follow-up question

This criterion concerns how justified the absence of a follow-up question is. Skipping a follow-up question is considered justified when the respondent gave complete answers to all questions asked within the framework of the Main question.

Examples of “Yes” (absence of follow-up question is justified):

Example 1. MQ: To start, tell me where you’re from? Where did you grow up and what was that place like? MA: I’m from the city of Snezhinsk, which is located in the Southern Urals. I spent all my childhood in this city before entering MSU. A closed city, a safe place to freely walk around the city as a child, not fearing the dangers of a big city. In high school my teacher called this city a “greenhouse”, and us “plants in it”, who grew up in favorable conditions, not knowing other life. In part I now agree with this.
Example 2. MQ: What subject in school was your favorite and why? MA: I loved algebra and geometry. Because in these subjects there is logic, you need to know theorems, rules and then everything is solved quite easily.
Example 3. MQ: Were there teachers who influenced you? If yes, in what way? What were they like? MA: The math teacher was completely unemotional, but nevertheless she showed that if you answer clearly and quickly to the question posed, everything will be fine. It was impossible to establish contact with her, but it was necessary to answer as in the army. This taught me to formulate my thoughts clearly and quickly.

Examples of “No” (follow-up question is required):

Example 1. MQ: How are you similar to or different from your parents and/or siblings? MA: I am completely different from them. I always strive for development, I never rest on my laurels.
Example 2. MQ: What subject did you dislike the most and why? MA: History, because I was never particularly interested in it.
Example 3. MQ: Remembering childhood, who were your heroes (role models) and why? MA: My heroes were travelers and discoverers: Fridtjof Nansen, Amundsen. People who went on expeditions to the Arctic and Antarctic.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Panfilova, A., Bolshev, V., Mozikov, M. et al. The AI interviewer: multi-faceted evaluation of adaptive questioning by large language models. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46517-7

Download citation

Received: 17 January 2026
Accepted: 26 March 2026
Published: 04 April 2026
DOI: https://doi.org/10.1038/s41598-026-46517-7

The AI interviewer: multi-faceted evaluation of adaptive questioning by large language models

Subjects

Abstract

Similar content being viewed by others

Strong and weak alignment of large language models with human values

Evaluation of large language models within GenAI in qualitative research

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendices

A Prompts for large language model (LLM) agents

A.1 Prompt for evaluating response completeness and generating clarifying questions

A.2 Prompt for generating reflective notes based on respondent answers

A.3 Prompt for summarization and compression of reflective notes

A.4 Prompt for personalized generation of answers to follow-up questions

B Interview questions

C Criteria for evaluating follow-up questions (with examples)

C.1 Friendliness of the question

C.2 Necessity of asking the question

C.3 Contextuality of the question

C.4 Openness of the question

C.5 Justification for skipping the follow-up question

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Strong and weak alignment of large language models with human values

Evaluation of large language models within GenAI in qualitative research

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendices

A Prompts for large language model (LLM) agents

A.1 Prompt for evaluating response completeness and generating clarifying questions

A.2 Prompt for generating reflective notes based on respondent answers

A.3 Prompt for summarization and compression of reflective notes

A.4 Prompt for personalized generation of answers to follow-up questions

B Interview questions

C Criteria for evaluating follow-up questions (with examples)

C.1 Friendliness of the question

C.2 Necessity of asking the question

C.3 Contextuality of the question

C.4 Openness of the question

C.5 Justification for skipping the follow-up question

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links