Candidate assessment in the age of AI: solutions to counter hidden assistance

Sommaire

BlogChevron en icon
Candidate assessment in the age of AI: solutions to counter hidden assistance

Candidate assessment in the age of AI: solutions to counter hidden assistance

11 minutes
12/05/2026
Rédigé par
Léo Fichet

One candidate has ChatGPT take a personality questionnaire on their behalf, calibrating social desirability to remain credible; a second uses a tool capable of feeding them answers during a video interview. A third uses a deepfake tool to alter their visual identity on screen, in real time.

These scenarios are no longer science fiction. They are documented, normalized, and widely shared on social media. They raise a question every recruiter is asking in 2026: how can you assess a candidate when artificial intelligence can whisper the answers, write CVs, fake personality tests, and soon solve technical exercises in real time?

The instinctive response is to revert to in-person interviews or to try to detect cheating. Text detectors, anti-cheating applications, advice on spotting warning signs in interviews, a mandatory return to face-to-face meetings. This is a strategic mistake. Detection is an arms race that is lost in advance, not out of defeatism, but by design.

In a webinar I hosted with Maxime Boutrouille, PhD in occupational psychology, we explored another path, grounded in forty years of psychometric research: shifting the assessment cursor toward methods that AI structurally cannot falsify. This article revisits and synthesizes the key takeaways from that session, aimed at Recruitment and Human Resources professionals who want to confidently prepare their processes for this new landscape.

🎥 Replay of the webinar Assessing Candidates in the Age of AI: Solutions to Counter Hidden Assistance

1. What are we really talking about? Three practices that are systematically conflated

Before analyzing the solutions, let's clarify what is meant by "AI-enabled cheating." The term covers three distinct practices, which must be separated because they do not call for the same responses.

Concealed real-time assistance

The candidate uses an artificial intelligence tool during the assessment: applications such as Final Round AI, Cluely or Interview Copilot that listen to the interview and generate answers live. For technical tests, for example, the Interview Coder tool created by a young Columbia student who landed job offers at Amazon, Meta and TikTok thanks to this algorithm solves coding challenges in real time, invisible to the recruiter.

Identity substitution

A third party takes the job interview in place of the candidate, or the candidate uses real-time deepfake technology to alter their visual identity on screen. According to Gartner, 6% of candidates admit to having committed this type of recruitment fraud during an interview, and by 2028, one in four candidate profiles worldwide could be fake. Fake candidates and fake profiles generated at scale have become an operational reality for recruiters, particularly in the tech sector.

Augmented upstream preparation

CVs, cover letters, and interview simulations generated or optimized by generative AI. The 2025 market trend report indicates that 65% of candidates use AI at some point in the application process, including 19% for the CV and 20% for the cover letter. This usage sits in an ethical and legal gray zone: it is neither illegal nor clearly fraudulent, as long as the candidate does not lie about their actual background.

It is the first category, concealed real-time assistance, that poses the most acute challenge to current assessment processes. That is what this analysis focuses on.

2. Why AI changes the game: what behavioral science tells us

CV embellishment, social desirability, and faking on personality questionnaires are not new phenomena. Cheating in interviews existed before ChatGPT. But generative artificial intelligence alters two fundamental parameters.

It lowers the perceived moral cost of dishonesty

A study published in 2025 in Nature (Köbis, Swan et al.) conducted 13 experiments with more than 3,000 participants. The protocol: people are offered the opportunity to cheat to win money.

When participants act alone, 95% remain honest. When they can delegate the task to an AI and especially when they can induce it to cheat without telling it explicitly, for example through an objective such as "maximize profit" the honesty rate drops to 15%. 85% cheat.

And when a human is explicitly asked to cheat, they refuse in 60 to 75% of cases. The AI agent, however, complies in 93 to 98% of cases. Artificial intelligence does not make people dishonest; it drastically lowers the psychological cost of being so, and it does not refuse to cooperate. This combination undermines any recruitment process that rested, even implicitly, on the candidate's moral discomfort with lying directly to an employer.

It scales optimization

Gray zones have always existed. You can embellish a CV, manage your interview answers, calibrate your personality on a psychometric questionnaire. What changes with generative AI is that this optimization becomes trivial, fast, and accessible to any candidate, including those with no knowledge of psychometrics or interview techniques. Artificial intelligence massively democratizes sophisticated cheating, on a scale the labor market has never seen before.

3. Why your current assessment tools are vulnerable (to varying degrees)

Let's take an honest inventory. Here is how AI affects each tool used in recruitment today.

CVs and cover letters: massive vulnerability, illusory detection

CVs and cover letters have always been fakeable. AI simply makes production more accessible and more massive. An internet user with no prior experience can now generate a credible application package in a matter of minutes. The two classic countermeasures diploma verification, reference checks and background screening are catch-up solutions, not preventive ones. They do not apply systematically to all candidates, and arrive too late in the recruitment process.

Personality questionnaires: critical vulnerability, rarely discussed

This is probably the most exposed tool, and paradoxically the least discussed. A demonstration given during the webinar (available in the replay) illustrates the problem clearly: ChatGPT is given a Big Five questionnaire and a one-line instruction: "answer so that the profile matches a frontline manager position in industry." Within seconds, the AI identifies the questionnaire's structure, spots reversed items, and calibrates social desirability to produce a credible profile (4.78/5 on conscientiousness, 1.5/5 on neuroticism not 1, which would be too perfect).

This empirical demonstration is confirmed by research. Philippe and Robby (2024) compared GPT-4 to 655 students who were explicitly asked to fake a psychometric questionnaire. The result: GPT-4 fakes a personality questionnaire better than 99.6% of humans on a classic Likert format, and better than 91.8% of humans on a forced-choice format, even though the latter was specifically designed to resist faking.

And detection? Very difficult. Completion time can be a clue, but it is drowned out by many other factors. There is currently no reliable method for detecting that a remote candidate used AI to answer a personality questionnaire. The machine learning algorithms that claim to do so have false positive rates too high to be deployed in production without legal risk.

Remote technical tests: rising and documented vulnerability

The figures speak for themselves. The attempted fraud rate on technical recruitment assessments rose from 16% in 2024 to 35% in 2025, with peaks of up to 40% for entry-level profiles. The market's response is telling: as early as mid-2025, Google and McKinsey reintroduced mandatory face-to-face interviews specifically to counter this phenomenon. An interesting paradox: these same companies otherwise encourage AI use at work. The tendency to ban in interviews what is allowed in daily work creates a legitimate tension that every employer must now navigate.

Cognitive tests: variable assistability depending on the type of reasoning

Abdel Karim and colleagues (2025) compared 18 AI models on cognitive tests. The results vary significantly depending on the dimension assessed. On verbal reasoning, GPT-4 reaches 79% performance: highly assistable. On numerical reasoning, between 20 and 53%: mixed. On visuospatial reasoning (e.g., progressive matrices), 22.5%: still moderate but progressing steadily.

A demonstration on visual matrices, presented in the webinar, shows that AI can identify some rules (number of elements, layout) but miss subtleties (color nuances, rotations, overlays). What resists today probably will not resist eighteen months from now. The new generation of reasoning models is advancing rapidly on these tasks.

Remote interviews: virtually undetectable real-time assistance

The remote video interview has become the norm for obvious logistical reasons: more flexibility, access to a wide range of talent regardless of location, time savings for both recruiter and job seeker. But this format offers an ideal attack surface. Real-time assistance tools via phone, discreet audio earpiece or display on a second screen leave few clues: latency, averted gaze, flat tone. And latency, the main signal, will probably disappear as models evolve rapidly.

The common thread running through all these vulnerabilities: they share the same structural flaw. They assess what the candidate says or writes precisely what language models are best at generating: the declarative. This observation paves the way for a paradigm shift.

4. The pivot: the fidelity continuum

Here is the angle no one is addressing, and which changes everything: if AI can cheat on an assessment, it is because that assessment measures something an AI can produce and therefore, by definition, something that is not a distinctive human competency in a work situation.

What AI cannot do (and probably will not do tomorrow either)

A paper published at ICLR 2026 (Young et al.), one of the most prestigious AI conferences, evaluated 10 language models including the most advanced models from OpenAI, Anthropic and Google on around 4,000 reasoning cases classified by cognitive level. The results form a clear staircase:

  • Level 1 (recognizing a simple attribute, counting elements what an MCQ does): ~57% success rate
  • Level 2 (spatial reasoning, rotations, symmetry): 33%
  • Level 3 (sequential reasoning under constraint, planning, categorization): 20%
  • Level 4 (conceptual reasoning: applying abstract principles to concrete situations): nearly 0%, even for the best models

In concrete terms: AI succeeds at what an MCQ does. It fails at what a manager does when handling a conflict in real time while factoring in the individual characteristics of the people involved, or when prioritizing under pressure in an unpredictable environment. This is the fundamental gap between a generated response and a lived decision.

The structuring principle: the fidelity continuum

From this research emerges a principle that structures the robustness of all assessment methods: the more faithfully a method reproduces the actual work situation, the harder it is to falsify, whether by a human or by an AI. This is also referred to as ecological validity.

Fom most vulnerable to most robust:

MCQs and self-report questionnaires → text-based situational judgment tests → video-based scenarios → immersive simulations → in-vivo assessment center

The reason is simple and hard to circumvent: AI cannot live the situation in the candidate's place. It cannot make a real-time decision, interact with an unpredictable counterpart, manage an emotion, prioritize under stress, or reason in multiple layers about events that occurred a few minutes earlier in a simulation. No autopilot, no third-party application can substitute for it.

The assessment center: maximum robustness, real logistical constraint

The assessment center, a battery of role-play exercises in which the candidate is confronted with realistic job scenarios (announcing an unpopular change, conducting a one-on-one with a team member, prioritizing notifications, planning trade-offs), reaches a predictive validity of 0.53 for predicting managerial potential, in the upper range of the scientific literature. Compared with 0.42 for a structured interview and 0.19 for an unstructured one (Schmidt et al., 2022).

It is the most AI-robust method available today. Its historical limitation: a cost and logistical heaviness that restricted its use to senior executive positions and large companies.

The strategic question, then: can we preserve the role-play logic that makes the assessment center so powerful, and make it accessible at a larger scale, across a wide range of positions? This is precisely the problem Yuzu is trying to solve with its Digital Assessment Center, combining the psychometric robustness of the historical format with the scalability of digital.

5. Action plan: auditing and redesigning your process

Beyond the diagnosis, here are three concrete steps to implement starting next week to prepare your recruitment process for the AI era.

Step 1: Audit your current process along 3 axes

For each stage of your recruitment process, ask yourself these three questions:

  1. Declarative or observed? Does the candidate describe what they would do, or do something observable? The more the process relies on the declarative, the more it is exposed to AI.
  2. Single output or cascading decisions? Does the candidate deliver a fixed final answer, or string together micro-decisions in real time? AI is comfortable producing a deliverable. It is much less comfortable chaining adaptive decisions under constraint.
  3. Asynchronous or synchronous under constraint? Does the exercise take place "whenever the candidate wants, however they want," or does it impose real-time constraints? Synchronous constraints (in person or otherwise) constitute a major lever for robustness.

If your answers consistently lean to the left, your process is probably vulnerable.

Step 2: Apply three redesign principles

  • Match the method to the stakes. The greater the economic and managerial impact of a hiring decision, the more the robustness of the method should be prioritized. There is no point in running an assessment center for an intern; it is essential for a future BU director.
  • Do not exclude vulnerable methods. They retain their value at the top of the funnel: CV, questionnaire, first video interview to filter volume. They help narrow the field before the decisive stages.
  • Concentrate robustness on the shortlist and tie-breaker stages. This is where the final decision is made, and therefore where investment in robust methods yields the best ROI for the company.

Step 3: Choose a path forward

Three paths, depending on your maturity and resources:

  1. Strengthen what exists: Further structure your interviews (moving from 0.19 to 0.42 in predictive validity), add face-to-face stages on the shortlist, set up a standardized assessment grid shared across all recruiters. At near-zero cost, these actions already significantly improve the quality of your decisions.
  2. Outsource: Delegate the most technical stages to an assessment center consultancy. The industry has been renewed in France and now offers partially digitalized formats, less logistically heavy than before.
  3. Internalize: Embed role-play exercises directly into your process, through a platform built for that purpose. This is the most demanding option, but also the one that makes your process most defensible in the long term, both psychometrically and from a legal standpoint (GDPR, AI Act, non-discrimination).

Whichever path you choose, explicitly document your policy on candidate AI use: what is allowed, what is prohibited, what falls in the gray zone. Failing to do so creates legal ambiguity and exposes the company.

Conclusion: it's not about banning AI, but about knowing the strengths and weaknesses of each method

Artificial intelligence has not created new flaws in recruitment. It has amplified existing ones and made their detection nearly impossible in remote formats. Trying to ban ChatGPT or Claude for candidates is a losing battle, both legally and technically. Trying to detect their use is just as futile: no current anti-cheating algorithm offers enough reliability to ground a defensible rejection decision.

The real answer lies neither in prohibition nor in detection. It lies in gradually shifting your assessment methods toward the right of the fidelity continuum: less declarative, more observation; fewer single answers, more cascading decisions; less unrestricted asynchronous, more synchronous constraints. Human intervention structured and augmented by the right tools remains at the heart of quality recruitment.

This is precisely the focus of the work being done at Yuzu, and it is also the bet that recruitment science has been making for forty years. AI does not change what predicts performance at work; it simply forces us to stop pretending that MCQs and unstructured video interviews are enough to identify the right talent.

In an era where generative AI is transforming the labor market across every function, the quality of your assessment methods deserves to become your true employer differentiator. Individual merit can only express itself if your tools allow it to.