LLMs Prefer Their Own Resumes — And That Breaks AI Hirin...

What Happened

Researchers studying LLM-based resume screening found something that should alarm every engineering leader with an AI-assisted hiring pipeline: when large language models evaluate resumes, they consistently prefer resumes generated by LLMs over those written by humans — even when the underlying qualifications are held constant.

The study, published on arXiv (2509.00462) and now trending on Hacker News with 300+ upvotes, tested multiple leading models in a controlled resume-screening setup. The researchers created parallel resume sets — same candidate profiles, same qualifications, same experience — but varied whether the resume text was written by a human, generated by one LLM, or generated by a different LLM. The models were then asked to rank, score, or select candidates.

The result was unambiguous: LLMs picked AI-generated resumes at rates significantly above chance. This wasn't a marginal effect buried in statistical noise. It was consistent across models, across prompt formulations, and across job categories.

Why It Matters

The finding exposes a structural problem in what has become a default practice at thousands of companies. According to industry estimates, over 75% of Fortune 500 companies now use some form of automated resume screening, and a growing share of those systems incorporate LLMs — either directly for ranking or as components in agentic hiring workflows.

The bias isn't just self-preference in the narrow sense (GPT-4 preferring GPT-4 output). Models also preferred resumes generated by *other* LLMs over human-written ones. This suggests that LLMs have learned to recognize and reward a particular style of writing — structured, keyword-dense, grammatically polished in a specific way — that AI text generators converge on. Call it the "AI house style." It's not that the content is better. It's that the pattern matches what the model's training data associates with "good."

This creates a two-sided fairness problem. On one side: candidates who use ChatGPT, Claude, or similar tools to polish their resumes get an invisible advantage that has nothing to do with their actual qualifications. On the other side: candidates who write their own resumes — perhaps because they're better writers, or because they come from backgrounds where AI tool adoption is lower — get penalized for being human.

The Hacker News discussion surfaced the predictable but important corollary: this is already a feedback loop. Candidates know (or suspect) that resumes are screened by AI. So they optimize for AI. Career coaches now routinely advise "run your resume through ChatGPT." The screener AI rewards the applicant AI's output. The system is converging on a world where the optimal resume is one no human wrote and no human reads.

Several commenters with hiring experience pointed out that this mirrors a pattern already known in SEO: when you optimize content for an algorithm rather than a reader, you get technically competent text that humans find strangely hollow. The difference is that in hiring, the stakes are someone's livelihood.

The Technical Root Cause

Why does this happen? The most likely explanation is distributional: LLMs generate text that sits in a high-probability region of their own output distribution. When the same (or similar) model evaluates that text, it assigns it higher likelihood — not because it's "better" by any external standard, but because it's more *expected*. Human writing, with its idiosyncrasies, varied sentence structures, and occasional roughness, falls in a different region of the distribution. The model doesn't dislike it, exactly. It just doesn't recognize it as strongly.

This is related to the well-documented phenomenon of LLM self-evaluation bias, where models rate their own outputs higher in quality benchmarks. But the resume context makes it concrete and consequential. A benchmark preference is academic. A hiring preference is legal exposure.

There's also a keyword-density effect. LLM-generated resumes tend to be more systematic about incorporating role-specific terminology — not because the candidate knows the jargon, but because the model was trained on job descriptions that use it. An LLM screener recognizes those patterns because it was trained on the same corpus. It's a closed loop of mutual recognition.

What This Means for Your Stack

If you're building or operating an AI-assisted hiring pipeline, this paper is a direct challenge to your fairness claims. Here's what to do about it:

Audit your screening layer. If you use an LLM anywhere in resume ranking — even as a "helper" that feeds scores to a human reviewer — you need to test for this bias. The methodology is straightforward: take a set of real resumes, generate AI-rewritten versions with identical content, and compare scores. If the AI versions consistently score higher, you have a problem.

Consider blind evaluation designs. Just as orchestras adopted blind auditions to reduce gender bias, resume screening systems may need to strip or normalize stylistic signals before LLM evaluation. This could mean preprocessing resumes into a standardized structured format (JSON, not prose) before the model sees them. Evaluate qualifications as structured data, not as prose style — that's the engineering fix.

Don't assume prompt engineering solves it. The study tested multiple prompt formulations, including explicit instructions to ignore writing style. The bias persisted. This is a property of the model's learned representations, not something you can instruct away with a system prompt.

Watch the legal landscape. NYC's Local Law 144 already requires bias audits for automated employment decision tools. The EU AI Act classifies AI hiring systems as "high-risk." A documented, reproducible bias like LLM self-preference is exactly the kind of finding that regulators and plaintiffs' attorneys will cite. If your system has this bias and you knew about it (you do now), the liability argument writes itself.

Looking Ahead

This paper lands at a moment when LLM-assisted hiring is accelerating, not slowing down. The economic pressure to automate screening is real — large employers receive thousands of applications per role, and human reviewers are expensive and inconsistent. But "consistent bias" is not an improvement over "inconsistent human judgment." The industry needs to treat LLM resume screening the way it should have treated earlier keyword-matching ATS systems: as a tool that requires adversarial testing, ongoing monitoring, and structural safeguards — not as a neutral oracle. The resumes are being written by AI. The screeners are AI. Somebody human needs to be watching the loop.

LLMs Prefer Their Own Resumes — And That Breaks AI Hiring

// tldr

// viewpoints

// deep dive

What Happened

Why It Matters

The Technical Root Cause

What This Means for Your Stack

Looking Ahead

// read from source

LLMs consistently pick resumes they generate over ones by humans or other models

// community takes

LLMs Prefer Their Own Resumes — And That Breaks AI Hiring

// tldr

// viewpoints

// deep dive

What Happened

Why It Matters

The Technical Root Cause

What This Means for Your Stack

Looking Ahead

// read from source

LLMs consistently pick resumes they generate over ones by humans or other models

// community takes

// share this