PostHog opts you into AI training by default. The fine print matters.

4 min read 1 source clear_take
├── "Opt-in by default for AI training on customer analytics data is a consent violation, especially under GDPR"
│  ├── top10.dev editorial (top10.dev) → read below

The editorial argues that product analytics data is 'one of the highest-signal, lowest-consent datasets in the modern SaaS stack' — containing event streams with user IDs, emails, and session recordings with potential PII. Repurposing data collected under one purpose to train AI models without explicit opt-in is flagged as a tense fit with GDPR Article 6's lawful basis requirements.

│  └── @EU-based PostHog customers (Hacker News) → view

Multiple EU-based customers in the HN thread raised GDPR concerns, noting that opt-in-by-default conflicts with Article 6 lawful basis requirements when the data being repurposed includes personal data originally collected for product analytics — a different stated purpose than AI model training.

├── "The notification process failed — users shouldn't learn about data policy changes from Hacker News"
│  └── @Long-time PostHog users (Hacker News) → view

Several long-time PostHog customers in the top HN comments said they only discovered the change because the HN post surfaced it — there was no in-product notification banner and no email to account admins. They argue a silent default flip on a policy this material is a trust-breaking failure of communication, regardless of the underlying AI policy.

├── "Training AI on customer data is necessary to ship better product features, and cooperative opt-in benefits everyone"
│  └── PostHog (PostHog blog) → read

PostHog frames the default opt-in as necessary to ship better AI features like anomaly detection, natural-language SQL over event data, and automated insights. They argue that opting in is the cooperative move because a better-trained model benefits every customer, and they lean on the self-hostable open-source option as the escape hatch for customers who object.

└── "The self-hosting escape hatch makes the Cloud default less objectionable"
  └── @PostHog (HN thread response) (Hacker News) → view

PostHog's own response in the HN thread leans heavily on the fact that the opt-in default applies only to PostHog Cloud customers — self-hosters are unaffected because their data never leaves their infrastructure. The argument is that customers who want stronger guarantees already have a no-cost path to them via the open-source deployment.

What happened

PostHog published a blog post announcing that customer data flowing through its product analytics platform will be used to train AI models — and the setting is on by default for existing and new accounts. There is an opt-out toggle in project settings. The post frames the change as necessary to ship better AI features (anomaly detection, natural-language SQL over your event data, automated insights), and argues that opting in is the cooperative move because every customer benefits from a better-trained model.

The Hacker News thread climbed to 195 points within hours, and the top comments are not about AI capability — they are about the default. Several long-time PostHog users noted they only learned about the change because the HN post surfaced it, not because of an in-product notification or an email to account admins. A handful of EU-based customers flagged that opt-in-by-default is, at minimum, a tense fit with GDPR's Article 6 lawful basis requirements when the data being repurposed includes personal data collected under a different original purpose.

PostHog is open-source-core and self-hostable, which complicates the framing. The opt-in default applies to PostHog Cloud customers. Self-hosters are unaffected because their data never leaves their infrastructure — a point PostHog's own response in the HN thread leans on heavily.

Why it matters

The surface story is "another vendor is training on customer data." The deeper story is that product analytics data is one of the highest-signal, lowest-consent datasets in the modern SaaS stack, and the industry has never really agreed on who owns it for downstream purposes.

Think about what actually sits in a PostHog instance. Event streams with user IDs, often with email addresses as the distinct_id. Session recordings — literal video of users interacting with your product, including any PII they typed into forms before your masking rules caught it. Feature flag evaluations tied to user cohorts. Funnel data that reveals your conversion mechanics. This is not log data. This is behavioral data about your end users, who never signed a contract with PostHog. They signed one with you.

That's where the legal chain matters. Your DPA with PostHog likely names them as a processor and you as the controller. Processors can only use data for the purposes the controller specifies. Training a general-purpose AI model on processor data is, in most readings of GDPR and the equivalent state-level US laws, a new purpose that requires either a refreshed lawful basis or a contractual amendment with affirmative customer consent — not a default toggle. PostHog's terms update is the contractual amendment piece, but the affirmative consent piece is doing a lot of work in the word "default."

Compare to the recent moves from peers. Mixpanel has stayed quiet on this front. Amplitude's terms explicitly carve out customer data from model training. Segment (Twilio) restricts it to aggregated, de-identified usage. Heap allows it but requires explicit opt-in. PostHog is the outlier here, and they're betting that the open-source credibility plus the self-host escape hatch covers the optics.

The community reaction is split along predictable lines. One camp argues PostHog has earned trust through open-sourcing the entire platform and the opt-out is real, so this is fine. The other camp argues that "opt-out is fine" is exactly the framing that got us tracking pixels and third-party cookies, and the precedent — a beloved OSS-friendly company flipping the default — is what's actually corrosive.

What this means for your stack

If you run PostHog Cloud, do three things this week. First, log in and flip the AI training toggle off until your legal team has reviewed the new terms. The setting lives in project settings under data management. The default-on state means your decision to do nothing is itself a decision that may be hard to defend to a regulator or an enterprise customer auditing your subprocessor list. Second, check whether your own DPA with your end users permits sub-processor use of their data for AI model training; for most B2B SaaS contracts, the answer is no without amendment. Third, audit your session recording masking rules — if you were sloppy about masking inputs because "it's just PostHog," the threat model just changed.

If you're evaluating PostHog right now, this isn't a dealbreaker, but it changes the calculus. The right comparison is no longer PostHog vs. Mixpanel on features and price — it's PostHog Cloud vs. self-hosted PostHog vs. a competitor whose data-use terms are stricter by default. Self-hosting PostHog has gotten meaningfully easier in the last 18 months with the Helm chart and the managed-self-host offerings from cloud resellers; for any team subject to HIPAA, SOC 2 Type II with strict subprocessor controls, or EU data residency requirements, the self-host path is now the lower-friction option even if the sticker price looks worse.

For everyone else: this is a useful prompt to actually read the data-use sections of every analytics, error tracking, and observability vendor you have. Sentry, Datadog, LogRocket, FullStory, and the rest have all quietly updated terms in the last twelve months. PostHog is just the one that's loud about it.

Looking ahead

The interesting question is whether PostHog walks this back. The HN thread is exactly the kind of signal that has caused similar reversals — GitLab on telemetry in 2019, Audacity on data collection in 2021, Hashicorp on the BSL license in 2023 (partially). PostHog's CEO has a track record of engaging directly in community threads and adjusting course; watch for a follow-up post within the next week. If the default flips to opt-in and the company ships a clear in-product notification flow for the change, this becomes a footnote. If it doesn't, expect a measurable bump in self-host migrations and a more cautious read on every "we're an OSS company, you can trust us" pitch for the next year.

Hacker News 195 pts 136 comments

PostHog will train AI models with your data (opted-in by default)

→ read on Hacker News
JimDabell · Hacker News

“Opt-in by default” is an oxymoron. If it’s default then I haven’t opted into anything. It’s been enabled by default.

Waterluvian · Hacker News

PostHog was a system we set up once, generally don't think about, and review from time to time, providing some occasional value. It was mostly harmless to leave around.But it's apparently yet one more thing we have to be actively suspicious of as it defaults towards an intolerable state. S

sixtyj · Hacker News

Most companies would bury this change in a deceptively boring T&Cs update, but we value transparency, so here's what you need to know in an internet-friendly numbered list:Users on our EU cloud instance are opted out by defaultSo too users with agreements that prevent training (e.g. BAA, MS

frankest · Hacker News

What a great reminder to build my own analytics and self host. PostHog just lost a customer. They could easily send a email to each customer asking if we want this. The assumption means they have no product intuition about their own customers, let alone the customers of their customers. Bye.

infecto · Hacker News

Thanks for posting. I had been in the fence for the past few months of switching. The new AI products combined with the weird UIs had been irking me for a while. This is the final nail in the coffin. Opt-in is a terrible business model imo.

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.