A cop used ChatGPT to fabricate evidence. Now the chain ...

What happened

Derbyshire Police has confirmed that one of its officers is under criminal investigation for allegedly using generative AI to create evidence across multiple cases. Sky News broke the story on the back of a referral to the Independent Office for Police Conduct (IOPC). The force has not named the officer, has not specified the model, and has not said which cases are affected — only that the conduct touched more than one investigation and that the force is reviewing prior work the officer handled.

This is the first publicly confirmed UK case of a serving police officer being investigated for fabricating evidence with a generative model, rather than merely using one as a writing aid. The distinction matters. Earlier UK guidance from the College of Policing and the National Police Chiefs' Council has focused on the relatively boring failure modes: officers pasting victim statements into ChatGPT to "tidy them up," or summarising interview transcripts with a hosted model and leaking PII in the process. This is a step beyond sloppy — it is the allegation that synthetic output was passed off as primary evidence.

The IOPC referral, per Sky, came from inside the force. That detail is the only good news in the story: somebody noticed, and the internal escalation path worked. Everything else is the bad version of a scenario the legal-tech community has been gaming out in conference panels for two years.

Why it matters

The instinct on a developer feed is to file this under "hallucination," but that framing is wrong and it lets the wrong people off the hook. A hallucination is a model failure. Fabricated evidence is a process failure — the absence of any system that would have caught a hallucination before it reached a case file. Every police force in the UK and most in the US have spent the last 18 months greenlighting LLM pilots — report drafting, body-cam transcription, statement summarisation, OSINT triage — without shipping the boring infrastructure that makes those pilots auditable.

Compare this to the world the same officers already live in for digital forensics. If you image a phone with Cellebrite, the tool writes a cryptographic hash of the source media, logs every extraction step, version-stamps the binary, and produces a report a defence expert can replay. That entire discipline — chain of custody, contemporaneous notes, tool validation — exists because courts learned the hard way in the 1990s that "the computer said so" is not evidence. We are about to re-learn the same lesson with transformers, and the Derbyshire case is the opening bell.

The community reaction on Hacker News (301 points, 200+ comments) split predictably along two axes. Forensic practitioners argued the failure is procedural and solvable: log the prompt, log the model and version, hash the inputs and outputs, require a second officer to countersign anything model-touched before it enters disclosure. Civil-liberties commenters argued the failure is categorical and unsolvable: a generative model is by construction a machine for producing plausible text that did not happen, and you cannot bolt evidentiary integrity onto that after the fact. Both camps are right about different things, and the consensus emerging in the thread is that LLM output should be inadmissible as evidence by default and only admissible as a witness's own work product after explicit human attestation.

The regulatory vacuum is the part that should bother anyone building in this space. The UK has no statutory framework for AI-generated material in criminal proceedings. The Criminal Procedure Rules require disclosure of the *process* used to produce evidence, but "I asked Claude" is not a process the courts have a precedent for parsing. The Forensic Science Regulator's codes don't cover LLMs. The College of Policing's AI guidance, published last year, is advisory. So the answer to "what's the standard?" right now is: whatever the officer remembers typing, if they remember, if they wrote it down.

What this means for your stack

If you sell anything to law enforcement, prosecutors, or regulated investigators — even adjacent (eDiscovery, OSINT platforms, case management) — the Derbyshire case just moved provenance from a roadmap item to a procurement requirement. Concretely:

Log the prompt, the model, the version, and the seed. Not a summary. The literal prompt string, the model identifier including minor version, the system prompt, the temperature, and any tool calls. Store it next to the output with a hash linking them. If your vendor can't produce this for an output their model generated 14 months ago, they cannot be in your evidence chain.

Treat model output as a derived artifact, not a source. The pattern that works, borrowed from forensic imaging: the source document is hashed and stored read-only, the model output is stored separately with a pointer to the source hash and the prompt that produced it, and the human-authored final version is a third artifact with explicit diff against the model output. Three artifacts, three signatures, no ambiguity about who said what.

Build the "second pair of eyes" workflow in. The Derbyshire failure mode — one officer, one prompt, one paste into a case file — is preventable in software by simply not letting model output reach an evidentiary destination without a second authenticated user countersigning it. This is the same control that already exists for arrest authorisations and search warrants. There is no reason synthetic text should be held to a lower standard than a custody decision.

Looking ahead

Expect the IOPC investigation to take 12-18 months and expect at least one appellate ruling out of it that defines what "AI-generated" means in a disclosure context. Expect every defence solicitor in England and Wales to start filing standing disclosure requests asking whether any model touched the prosecution's evidence — because that question costs nothing to ask and the answer is now material. And expect a wave of "AI provenance" startups pitching forces with a story that looks a lot like the Cellebrite playbook from 2003. The ones that win will be the ones that treat the audit log as the product and the model as a commodity, not the other way around.

A cop used ChatGPT to fabricate evidence. Now the chain of custody is a prompt.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Police officer investigated for using AI to 'create evidence' in multiple cases

// community takes

A cop used ChatGPT to fabricate evidence. Now the chain of custody is a prompt.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Police officer investigated for using AI to 'create evidence' in multiple cases

// community takes

// share this