A look at why the hiring process keeps failing — and why the technology meant to fix it often deepens the problem.
There is a moment most professionals recognise. You leave an interview feeling certain — certain you nailed it, or certain you didn’t. The conversation felt real. The rapport felt genuine. And then the outcome makes no sense.

This is not a personal failure of intuition. It is a structural failure of the interview as an instrument.
The data has been clear for decades
In 1998, psychologists Frank Schmidt and John Hunter published one of the most comprehensive meta-analyses in the history of personnel research — 85 years of data on how well different selection methods actually predict job performance. Their finding on unstructured interviews was uncomfortable: a validity coefficient of 0.38 out of 1.0. Not zero — but not far from a coin toss when scaled to real hiring outcomes.
A 2022 update by Sackett and colleagues made it worse. They found unstructured interview validity had dropped to 0.19 — less than half of structured interviewing’s 0.42. The gap between how interviews are run and how they should be run has not narrowed with time. It has widened.
Inter-rater reliability tells a similar story. When two interviewers evaluate the same candidate independently, they disagree roughly a quarter of the time on who the better candidate is — even when they sat in the same room, watched the same conversation, and heard the same answers.
Daniel Kahneman put a number on this in Noise: A Flaw in Human Judgment (2021): if all you know is that Candidate A impressed an interviewer more than Candidate B, the chance that A is actually the stronger hire is somewhere between 56% and 61%. Marginally better than guessing. The problem is not that interviewers are careless — it is that unstructured human judgment is inherently noisy, and the interview format amplifies that noise rather than containing it.
The most striking evidence comes from a 2013 study by Jason Dana, Robyn Dawes, and Nate Peterson published in Judgment and Decision Making. Participants were asked to predict students’ GPAs after conducting interviews. The twist: one group conducted normal interviews; another received random, pre-generated answers that bore no relationship to the actual student. The group with random answers made predictions just as confidently — and just as accurately — as the group with real interviews. The interview, in this case, added no information. But people believed it did.
This is not a bug in certain interviewers. It is a feature of how the human brain interprets conversation: it constructs a coherent narrative from available signals, confident that it is reading the person, when it is often reading itself.

Why is nothing changing?
Structured interviews — where all candidates are asked the same questions, evaluated against the same criteria, with scores recorded before comparison — have been shown consistently to outperform unstructured ones. Validity coefficients above 0.50, two to three times higher inter-rater agreement, and rejected candidates who reported significantly better experiences even when they didn’t get the job.
And yet adoption remains low. The reason is not ignorance of the research. It is that structure that feels worse. It feels rigid. It feels like it gets in the way of the conversation. Hiring managers trust their read of a person more than a rubric — a phenomenon Scott Highhouse called “stubborn reliance on intuition and subjectivity” in his 2008 paper of the same name.
So structured interviewing exists, works, and is largely ignored. Into this gap, AI arrived.

AI entered promising measurement. It often delivered pattern-matching.
The pitch was reasonable: replace inconsistent human judgment with consistent algorithmic scoring. Remove bias. Increase speed. Scale hiring across thousands of candidates without proportionally scaling interviewer time.
The execution, in many cases, compounded the problem.
Amazon built a hiring algorithm trained on a decade of its own résumés. In 2014, the system was quietly deployed to rank candidates from one to five stars. By 2015, internal audits had found it was systematically downranking résumés that included the word “women’s” — as in women’s chess club, women’s debate team. Verbs like “executed” and “captured,” more statistically common in male-authored résumés, were being rewarded. Amazon scrapped the project in 2017. Reuters broke the story in 2018. The system had learned the pattern of past hiring, and past hiring had been biased.
HireVue, one of the largest video interviewing platforms, offered AI analysis of candidates’ facial expressions, tone of voice, and word choice. The claim was that these signals predicted job performance. In 2021, following an FTC complaint from the Electronic Privacy Information Center and sustained public criticism, HireVue discontinued the facial analysis feature. The company acknowledged the technology “wasn’t worth the concern.” Independent researchers had found no valid evidence that facial microexpressions reliably predicted anything about job performance — the system was correlating appearance with hiring outcomes from historical data, not measuring competence.
These are not edge cases. They are illustrations of a pattern: AI systems trained on historical outcomes inherit historical biases, and optimise for them at scale. The speed advantage of AI becomes a speed advantage for systematic error.
Regulators noticed. Illinois passed the Artificial Intelligence Video Interview Act, requiring employers to disclose and get consent before using AI to analyse interview footage. New York City’s Local Law 144, which came into effect in July 2023, now requires independent bias audits of any automated employment decision tool, with public disclosure of results. More legislation is coming. The EEOC issued technical guidance on AI hiring tools in 2023. The window for unaudited AI deployment in hiring is closing.
The problem was never human vs. machine
The framing of “AI vs. human judgment” has been a distraction. The actual problem is simpler and harder to solve: hiring lacks a shared measurement standard.
A structured interview is more predictive not because it is rigid, but because it produces comparable data. Same questions, same criteria, same scale. When two interviewers score a candidate against the same rubric, disagreements become visible and resolvable. When each interviewer runs a different conversation following their own instincts, there is nothing to compare — just impressions.
Most AI tools deployed in hiring have not solved this problem. They have automated the impression. They have turned a subjective human read into a faster, more confident, harder-to-question subjective algorithmic read. The output looks like a number. It carries the authority of computation. But if the input was unstructured and the model was trained on biased historical outcomes, the number is not a measurement — it is laundered intuition.
The cost of getting this wrong is not abstract. SHRM estimates the total cost of a bad hire at up to 50–60% of annual salary. For senior roles, substantially more. These are direct costs: rehiring, retraining, lost productivity, and team disruption. They do not include the cost to candidates who were evaluated unfairly , or the cost to companies that missed the right person because their process was too noisy to tell the difference.
What would actually work
The research points consistently in one direction: structure the measurement before you run the conversation.
Decide in advance what you are measuring. Define what a strong answer looks like versus a weak one, before you hear any answers. Ask every candidate the same questions. Score before you compare. Calibrate your scoring against known outcomes over time so you can identify when it is drifting.
None of this is new. The research has supported structured, competency-based interviewing for decades. What is new is the computational tooling to make the structure scalable — to run the same rigour across hundreds of candidates, in parallel, without proportionally expanding interviewer workload.
The opportunity for AI in hiring is not to replace the interviewer. It is to give the interviewer a structured measurement framework they would never build themselves, and a consistent, auditable record of how every candidate was evaluated.
That is a harder product to build than an algorithm that watches faces. It requires designing the measurement layer before writing a single line of model code. It requires knowing what you are measuring, why it predicts performance, and how you will know if the scores are drifting. It requires calibration as an ongoing discipline, not a launch-day checkbox.
The AI hiring tools that will survive regulatory scrutiny and actually improve hiring outcomes are the ones built on that foundation. The ones that cannot explain why a candidate scored the way they did — in terms traceable to actual interview evidence — will face increasing pressure from regulators, candidates, and the companies that eventually find their hires are no better than before.
The interview is not going away
Despite everything, the interview will remain part of the hiring. It is a human process. It signals something real about how a person thinks under pressure, engages with unfamiliar questions, and communicates about their work.
The question is whether we treat it as a conversation we vaguely remember and score from memory, or as a structured data collection event with defined inputs, measured outputs, and a feedback loop that makes the next one better.
For twenty-five years, the research has supported the latter. The tools to do it at scale exist. The only thing missing is the will to hold AI in hiring to the same standard we hold any other measurement system: show your work, validate your outputs, and get better over time.
If you can’t explain why a candidate scored the way they did, you didn’t measure them — you judged them.
The Interview Is Broken. And Most AI Is Making It Worse. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.