arXiv Will Ban You for a Year If Your Citations Are Hall...

What happened

arXiv, the preprint server that hosts over 2.4 million papers and serves as the de facto distribution channel for research in physics, mathematics, computer science, and adjacent fields, has introduced a policy targeting a distinctly modern problem: hallucinated references. Authors caught submitting papers with fabricated citations — references that point to papers that don't exist — now face a 1-year ban from the platform.

The policy was highlighted by Tom Dietterich, a foundational figure in machine learning and former president of AAAI, who shared the announcement on Twitter. The Hacker News discussion that followed drew significant engagement (400+ points), reflecting how deeply this issue resonates across the research and engineering communities.

The core of the policy is blunt: if your paper cites work that doesn't exist, you're out for twelve months. No exceptions carved out for "the LLM generated it" or "I didn't verify the bibliography." The responsibility lands squarely on the submitting author.

Why it matters

This isn't arXiv being reactionary. It's arXiv catching up with a problem that has been quietly metastasizing since GPT-3.5 made it trivially easy to generate plausible-looking academic text. LLMs are notorious for fabricating citations — they'll confidently produce author names, journal titles, volume numbers, and page ranges for papers that have never been written. The outputs look real enough to pass a casual glance, and that's exactly the problem.

The scale of the issue is hard to pin down precisely, but the signals are everywhere. Peer reviewers have reported encountering fabricated references with increasing frequency. A cottage industry of "citation verification" tools has emerged. And several journals have already retracted papers after discovering that key references were entirely invented.

What makes arXiv's response significant is that arXiv isn't a journal — it's infrastructure. It doesn't peer-review papers. It's a hosting platform. For infrastructure to start policing content quality at this level represents a meaningful shift in how the academic ecosystem is responding to AI-generated artifacts. When your hosting provider starts enforcing quality gates, you know the problem has crossed a threshold.

The 1-year ban is also notable for its severity. arXiv could have opted for a warning system, a flagging mechanism, or a requirement to re-submit with corrections. Instead, they chose a punishment that has real career consequences. For researchers on the job market, grant cycles, or tenure clocks, losing arXiv access for a year is not trivial. It means your work becomes effectively invisible to the community that matters most during exactly the period when visibility counts.

There's an argument that this is too harsh — that honest mistakes happen, that authors might unknowingly include a hallucinated citation from a collaborator's draft, that the line between a genuinely misremembered reference and an AI-fabricated one is blurry. These are fair points. But the counterargument is stronger: citations are the connective tissue of academic knowledge, and fabricated ones don't just waste a reader's time — they erode the trust infrastructure that makes preprint servers viable in the first place.

What this means for your stack

If you're a practitioner who publishes research, contributes to academic papers, or maintains open-source projects with academic documentation, the implications are concrete.

First, if you use LLMs to draft literature reviews or related work sections, you now need a verification step that is non-negotiable. Every single citation needs to be checked against an actual database — Google Scholar, Semantic Scholar, DBLP, or the source journal itself. This isn't optional due diligence; it's a requirement with teeth. Tools like Semantic Scholar's API can automate some of this, but the responsibility is yours.

Second, this policy will likely propagate. arXiv tends to set norms that conferences and journals follow. If you're building internal tooling or workflows that involve AI-assisted writing for technical documents, now is the time to add citation verification as a pipeline step, not an afterthought. The same principle applies to technical blog posts, documentation, and any content where fabricated references could damage credibility.

Third, this is a useful case study in AI liability. The policy doesn't care whether the hallucination was generated by Claude, GPT-4, Gemini, or a human with a bad memory. The author is responsible. This "author-as-final-validator" model is likely to become the default across domains — and it has implications for how teams should structure their AI-assisted workflows. Every AI output that gets published needs a human verification gate, and that gate needs to be more than a cursory skim.

For teams building AI-assisted research tools, there's also a product opportunity here. Citation verification could become a standard feature in academic writing assistants, similar to how grammar checkers became table stakes for word processors. The market signal from arXiv is clear: the demand for reliable citation checking is only going to grow.

Looking ahead

arXiv's policy is one of the first concrete examples of an institution drawing a bright line around AI-generated content quality — not by banning AI use, but by holding humans accountable for the output. That's a more sustainable model than blanket AI bans, and it's one we'll likely see replicated across academic publishing, regulatory filings, and legal documents. The era of "generate and ship" is giving way to "generate, verify, and own." For practitioners, the takeaway is straightforward: your name on the paper means the citations are your problem, regardless of who — or what — wrote them.

arXiv Will Ban You for a Year If Your Citations Are Hallucinated

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

New arXiv policy: 1-year ban for hallucinated references

arXiv Will Ban You for a Year If Your Citations Are Hallucinated

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

New arXiv policy: 1-year ban for hallucinated references

// share this