The researchers demonstrated through controlled experiments that LLMs consistently prefer AI-generated resumes over human-written ones, even when qualifications are held constant. The bias was consistent across models, prompt formulations, and job categories, and extended beyond self-preference — models preferred output from other LLMs too, suggesting convergence on an 'AI house style' that models reward.
Submitted the research to Hacker News where it gained 312 points and 166 comments, highlighting the finding that LLMs consistently pick resumes they generate over ones written by humans or other models as a significant concern for automated hiring pipelines.
The editorial argues that the preference isn't about better qualifications but about LLMs recognizing and rewarding a specific writing pattern — structured, keyword-dense, grammatically polished in a particular way — that AI text generators converge on. This 'AI house style' matches what models' training data associates with 'good,' creating a bias rooted in form rather than substance.
The editorial emphasizes that this isn't a theoretical concern but a structural problem already embedded in default hiring practices at thousands of companies. With over 75% of Fortune 500 companies using automated resume screening and a growing share incorporating LLMs, the bias creates a two-sided fairness problem affecting real candidates at massive scale.
Researchers studying LLM-based resume screening found something that should alarm every engineering leader with an AI-assisted hiring pipeline: when large language models evaluate resumes, they consistently prefer resumes generated by LLMs over those written by humans — even when the underlying qualifications are held constant.
The study, published on arXiv (2509.00462) and now trending on Hacker News with 300+ upvotes, tested multiple leading models in a controlled resume-screening setup. The researchers created parallel resume sets — same candidate profiles, same qualifications, same experience — but varied whether the resume text was written by a human, generated by one LLM, or generated by a different LLM. The models were then asked to rank, score, or select candidates.
The result was unambiguous: LLMs picked AI-generated resumes at rates significantly above chance. This wasn't a marginal effect buried in statistical noise. It was consistent across models, across prompt formulations, and across job categories.
The finding exposes a structural problem in what has become a default practice at thousands of companies. According to industry estimates, over 75% of Fortune 500 companies now use some form of automated resume screening, and a growing share of those systems incorporate LLMs — either directly for ranking or as components in agentic hiring workflows.
The bias isn't just self-preference in the narrow sense (GPT-4 preferring GPT-4 output). Models also preferred resumes generated by *other* LLMs over human-written ones. This suggests that LLMs have learned to recognize and reward a particular style of writing — structured, keyword-dense, grammatically polished in a specific way — that AI text generators converge on. Call it the "AI house style." It's not that the content is better. It's that the pattern matches what the model's training data associates with "good."
This creates a two-sided fairness problem. On one side: candidates who use ChatGPT, Claude, or similar tools to polish their resumes get an invisible advantage that has nothing to do with their actual qualifications. On the other side: candidates who write their own resumes — perhaps because they're better writers, or because they come from backgrounds where AI tool adoption is lower — get penalized for being human.
The Hacker News discussion surfaced the predictable but important corollary: this is already a feedback loop. Candidates know (or suspect) that resumes are screened by AI. So they optimize for AI. Career coaches now routinely advise "run your resume through ChatGPT." The screener AI rewards the applicant AI's output. The system is converging on a world where the optimal resume is one no human wrote and no human reads.
Several commenters with hiring experience pointed out that this mirrors a pattern already known in SEO: when you optimize content for an algorithm rather than a reader, you get technically competent text that humans find strangely hollow. The difference is that in hiring, the stakes are someone's livelihood.
Why does this happen? The most likely explanation is distributional: LLMs generate text that sits in a high-probability region of their own output distribution. When the same (or similar) model evaluates that text, it assigns it higher likelihood — not because it's "better" by any external standard, but because it's more *expected*. Human writing, with its idiosyncrasies, varied sentence structures, and occasional roughness, falls in a different region of the distribution. The model doesn't dislike it, exactly. It just doesn't recognize it as strongly.
This is related to the well-documented phenomenon of LLM self-evaluation bias, where models rate their own outputs higher in quality benchmarks. But the resume context makes it concrete and consequential. A benchmark preference is academic. A hiring preference is legal exposure.
There's also a keyword-density effect. LLM-generated resumes tend to be more systematic about incorporating role-specific terminology — not because the candidate knows the jargon, but because the model was trained on job descriptions that use it. An LLM screener recognizes those patterns because it was trained on the same corpus. It's a closed loop of mutual recognition.
If you're building or operating an AI-assisted hiring pipeline, this paper is a direct challenge to your fairness claims. Here's what to do about it:
Audit your screening layer. If you use an LLM anywhere in resume ranking — even as a "helper" that feeds scores to a human reviewer — you need to test for this bias. The methodology is straightforward: take a set of real resumes, generate AI-rewritten versions with identical content, and compare scores. If the AI versions consistently score higher, you have a problem.
Consider blind evaluation designs. Just as orchestras adopted blind auditions to reduce gender bias, resume screening systems may need to strip or normalize stylistic signals before LLM evaluation. This could mean preprocessing resumes into a standardized structured format (JSON, not prose) before the model sees them. Evaluate qualifications as structured data, not as prose style — that's the engineering fix.
Don't assume prompt engineering solves it. The study tested multiple prompt formulations, including explicit instructions to ignore writing style. The bias persisted. This is a property of the model's learned representations, not something you can instruct away with a system prompt.
Watch the legal landscape. NYC's Local Law 144 already requires bias audits for automated employment decision tools. The EU AI Act classifies AI hiring systems as "high-risk." A documented, reproducible bias like LLM self-preference is exactly the kind of finding that regulators and plaintiffs' attorneys will cite. If your system has this bias and you knew about it (you do now), the liability argument writes itself.
This paper lands at a moment when LLM-assisted hiring is accelerating, not slowing down. The economic pressure to automate screening is real — large employers receive thousands of applications per role, and human reviewers are expensive and inconsistent. But "consistent bias" is not an improvement over "inconsistent human judgment." The industry needs to treat LLM resume screening the way it should have treated earlier keyword-matching ATS systems: as a tool that requires adversarial testing, ongoing monitoring, and structural safeguards — not as a neutral oracle. The resumes are being written by AI. The screeners are AI. Somebody human needs to be watching the loop.
Anecdata, sample size of one:When I was looking for my next role after being laid off, I didn’t get much of a response with my human handmade resume despite my experienceJust for kicks, I asked ChatGPT to “Analyze my resume and give it a score for what percentage it was in” then I asked it to revise
Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.Human when preparing a CV: "Make my CV more professional"LLM many days later presentin
We are without our consent introducing a party in between people. The models become the arbiters of who does and does not get a job. It feels problematic.
So just to test, loaded qwen/qwen3-v1-30b locally, and fed my 100% human-written resume and asked it "Make this resume more professional".Mucho bullets came out.My sentence "I specialized in enterprise data modeling and worked on Cost of Goods Sold optimizations across entire cus
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
I'll copy what I wrote on LinkedIn (note: I read roughly 25 pages, which is half the paper, and read it quickly)[0]:"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.Their actual method seems to be taking a human written resume, deleting the ex